Developer Principles with Markus Schirp

Markus Schirp & Adam Hawkins discuss the principles and practices that led to mutation testing and impact on effective daily work.

[00:00:00] Hello and welcome. I'm your host, Adam Hawkins. In each episode I present a small batch with theory and practices behind building a high velocity software organization. Topics include dev ops, lean software architecture, continuous delivery, and conversations with industry leaders. Now let's begin today's episode.

[00:00:26] Hello. Again, everybody. Welcome back to small batches. Today I'm talking to somebody you likely haven't heard about, and there's a bit different than the kind of people I typically have on the show. But I think this is a good thing. It's going to bring a different perspective to the show that we haven't really seen yet.

[00:00:46] So I want to introduce you to Marcus. I met Marcus way back. I don't know, 2012 at a conference in Russ love. It's actually a Ruby conference and Marcus was doing a talk on testing. I believe. Or COVID metric something of the sort. And we got to talking after the talk and kind of hung out through the conference and it immediately struck me that this is a really smart guy who knows a lot more about software development than I do.

[00:01:21] And what really impressed me about Marcus was his attention to principles and metrics. Now I wasn't really so keen on the metrics way back then. But now with my focus on DevOps and software delivery performance, these metrics, they come up again, things like developer efficiency, time from commit to production, waste, measure, code quality.

[00:01:48] All of these are signals that can drive the software development process. So I thought it'd be good to have Marcus on the show to discuss his principles for being a software developer. One thing I really like about Marcus is that he does in fact, actually have principles and sticks through them and he uses these principles to inform his daily work and to change his workflow.

[00:02:14] And if needed even change tools to improve the metric. So I kind of came at this from the code quality side because Marcus has created a tool called mutant mutant is a mutation testing tool for Ruby. So in case you don't know, Ruby is a dynamic language. So that introduces all kinds of different complexities and non-deterministic behaviors to say it like.

[00:02:47] But as Marcus describes, he is a dynamic language Exorcist, and it's just a really fun description. But the point of mutant is to take the test. Modify the AST such that there's a change in the tests, run the tests. If they succeed, then something is wrong with the tests or the code. Right? So think of it this way.

[00:03:14] If you were to manipulate the test suite or, you know, change a greater than to equal or equal to less than, and the test still passed well, then the test is an accurate and it needs to change. Kind of the way that I think about it is that mutant is a tool for testing your tests. Well, anyway, we'll talk about that a little bit, but what really connected with me, which is the idea of changing the way that we think about a software quality and if needed creating tools to just change the way that we can test.

[00:03:50] So anyway, I have Marcus on the show today to discuss the principles and practices of being an effective software developer. So I'll give you my conversation with Marcus. Sure.

[00:04:06] Marcus, thanks for coming on the show.

[00:04:08] Thanks for having me.

[00:04:09] I'm excited.

[00:04:10] Yeah, me too. So I've already introduced you in my own words. Why don't you introduce you are self and your own.

[00:04:16] Okay. So I'm I've menus tech line staff prescription. Sometimes I say, I'm going to dynamic language exosystem sometimes I say, I like to work to turn software development into an engineering discipline and finding, missing, missing axes of correctness and writings and missing tubes.

[00:04:35] Sometimes I describe myself as a developer. Sometimes I describe myself as an entrepreneur because all of this is true in some form, right? Sometimes I contracted. Equities. Sometimes I do more like technology convergence. I did some InfoSec, so I'm really hard to put into one category.

[00:04:50] That's why one of the reasons why I want to talk to you is because you have a wide variety of experience.

[00:04:56] And one of the things I want to focus on is when I originally met you, it was at a conference and that was kind of when I got to expose more seriously to mutation testing. And to me, like at the time this was like a big light bulb in my head, because one, it was just totally new, which was, is rare to actually find something totally new and in tech.

[00:05:22] Right.

[00:05:23] But right now, if you just ask people, have you ever heard about mutation testing for rank. And cross it's, it's still a really, really real topic. Many.

[00:05:32] Yeah, no surprise because even still, you know, like TDD and different kinds of testings, doesn't have like a hundred percent mind share. But what really excited me about the mutation testing was it, it was just a whole nother way to assess the quality of the software that we're building.

[00:05:50] And now, like with the focus on continuous delivery, It's all predicated on having some way to verify the correctness of your software with high confidence and mutation testing was one way to take it to a whole nother order of banks. So maybe we start the conversation as what was the process and logic behind creating?

[00:06:15] Well, maybe we ask to actually define mutation testing a little bit to start there,

[00:06:21] try to just set the scene for this discussion. So it's really important about. Is that we just defined some terms. So I can use this terms to basically spin up the easy geometric form electric, explain to people. So it's the definition of check.

[00:06:34] So you're talking about tests and so on. It's the false, but the test, very specific thing in software development. So you write the test and you execute the test and to differentiate various forms of testing, because you can also manually test, you can review, which is also a form passing a test. So let's call it check.

[00:06:52] So every single which happens between writing a line of code and getting into production series checks, checks in many levels. First level is typically does the powers are expenses and which there is not more. So it's just like the it, and then you can run it and you can hope that it checked off production because production, it will do a check, right?

[00:07:15] You want to say, and then your error rates, if you're tracking, which sometimes certain doesn't exist, but let's say business does not exist after a certain amount of time because you ultimate check. If you failed to deliver correct software, then you probably failed some checks before. Sometimes there are checks for when they do the job, the business idea, but that's focused on the pure engineering checks.

[00:07:38] So engineering checks and utopian production. So as I mentioned, Then there is in a typical dynamic language environment. You have like boot up all the time. Let's assume you just move your application and you five from development. It's a form of check, but you all know that we cannot exhaustively test everything by a factor of development for a second.

[00:08:03] And then you're right. Some automated tests, automated tests, also a false check that let's assume the development team says, okay. So we know that. And we could test the wrong thing. We could actually specify the wrong thing and implemented wrong and said, oh, let's do some reviews before it goes into production.

[00:08:23] It's not one developer stores, two eyes, eyes, whatever the policy is. It's another check which has to be passed. And thankfully, actually GitHub came around and also need to check that. And those related to is that at all, it's just coincidence. What's interesting here is that mutation testing is a form of, let's say a second dimension check mutation testing takes your code and your tests as input and produces a report or changed semantics, which still pass it tests.

[00:08:53] Each of these mutations, which is a domain from imitation testing that says each of the. Ultimately can identify the changes, which still pass your tests, our cultural action for the developer, the reviewer, whatever the stage you decide to implement petition test again, you can do mutation testing pre-commit stage.

[00:09:13] You can do it on site. You can do it. I've seen all these things, but the call to action of each alive mutation, which is to be kind of representative as a diff at least in my truth. The question, why don't we use this change that coat, and this can change the court representing as a mantic reduction, remove an argument for remove, mess up from a chain for you.

[00:09:37] On message. If initially, remember for default, it's always, as a mental reduction can always punch or something, which the change to automatically get them to change, which then passes. That's less. So the call to action is please developer, assume this is an expert developer asking you why don't we write it that way?

[00:09:57] Which does less because expert developer knows knows the best thing you can do is just remove code or you can remove code, got a high degree chances. Good. And the answer is yes, we should remove. In some cases it's actually, oh, this is very important for you, but breaking business in variant, then you'll probably be better ready to test because it's so important that you do not remember.

[00:10:19] Then there's probably a missing chest, and this is basically a way to generate a check based on other checks, because test is a form of a check. It's a code says you have to write a check mark in a, in a series of things you first need to write code, which then gets checked. So it's this second dimension check, which produces.

[00:10:40] Relative to the quality of some tissue testing engine and action of a report to do these decisions. It can be zero semantics, which can be removed or it's important enough to as a test. Okay. That's right.

[00:10:52] So let's, you covered a lot of ground there. And I think like for me, it's not the first time I've heard some of these terms and we've talked about it, but let's try to make it concrete for the audience.

[00:11:03] So the idea here is that let's say that you have some code and this code may be. Like as comparison, the comparison might be greater than or equal to some value there really

[00:11:14] nice example, let's go with great doesn't equals yes. Great. As an equal, which you can actually decompose into a is greater than B or a is exactly equal.

[00:11:25] So it's actually, it's actually just a compressed form of the same statement so we can decompose it into two. It's actually basically two branches and two comparisons and connected, whereas they all operate right. We'll do is try to simplify this code. And let's a proof that you actually want as an equals and not just greater.

[00:11:46] Right?

[00:11:47] So the idea here is that you change the source code in some way and run the tests. And then if the test still, they still pass, then something is unspecified.

[00:11:59] And that's basically, there's a call to action, right? It's a test they'll pass that gets reported as a flag and export develop public mix. So as on your profile, you're actually great as an equals or the actual use equals here.

[00:12:09] Yeah. So just to stay to that again, and it's just a shorter term that the idea here for the mutation testing is to verify. It's almost like a check of the tests themselves. Right? So if you can mark it.

[00:12:23] I wouldn't say that you and say that's really the first a disclaimer, this is my definition of mutation testing.

[00:12:30] I'm an also often auditioning engine and I've gone into my direction and motivation defines the trunks and so on for the reason I think I found a nice self-contained subset of reasoning must not be correct, but it's correct enough for me. And testings of tests is actually. A really bad selling point in my opinion, because it's not focusing on the actual action it's testing for the sense of simplifications in your coach that still passes the test because it could also be that the test totally welded and to develop on just simply wrote more than the test asked for, which is typically.

[00:13:08] It's not about getting tests at all. It's about proving statistically proving because it's not a 100% proof. So these tools are, they are statistically proving that there is no easy to reject simplification on your code. It's like first step. So when I, when I review code, I typically look for. Well, I do multiple passes.

[00:13:31] The first pass is to be clearly. I just look at the syntax. It's nicely aligned and so on. So just so I can trust my pet on matching and I can

[00:13:42] read the code and then to translate to some kind of object tree in my brain. There's certain tasks. Structured enough that I can trust this automated pattern matching into, into whatever semantics gets represented in my brain, but I'm thinking about coach and then I need to think about, okay. So is there is the intention of the change with the height, which is a high level of this.

[00:14:02] Is that a change implementing a feature? Then I need to make sure that this actually conforms to the business level when not doing anything. And I would go through each line of code and think about, is this actually required? Can we just make it shorter? So I would constantly do this in our brain and the mutation testing.

[00:14:23] That's the same, but much faster at a lower sophistication level, but it doesn't cheat human sex that pre-purchase presuppositions. So they know how I flipped at this less than 1000 times. I can just skip it. You all have, this is the best code reviewers. I know have a 99% chance to not skip and then find something.

[00:14:44] I'm not as disciplined. This is basically the reason you exists because I was subjected to some very good code reviews in my growing up as a, the plot. And we just arrived. These operators just focused on the engine, just tries to verify is there is no simple to explain simplification on the court, which does not pass the.

[00:15:04] And this property gets reported in for each violation of the property. You get this cultural action expressed as a diff, why not use this word? And your answer is actually a short, oh my God. There was about something really important, which I didn't say. Because if it's not important enough specified then transversely, it means it should have been removed from the code, right?

[00:15:25] He has no, if you have a strictly semantic reducing materialistic engines that are other kinds, mine is strictly symmetric reducing since there was just these two outcomes. So ISER, you can just use a simplified. If you agree, it's simplified. It's very easy to agree that using an equals rather than the greatest empty quits is a simplification, because the truth is much smaller.

[00:15:45] And you shouldn't think about it as verifying that, the test that correct? You verify as it, no. Extra in the court, which is tested. And if it's think about the typical cycle, what are we supposed to do? We are supposed to do and write a test. See it fail. When you see the test fail. What I was supposed to do just enough code to get the test pass nothing more.

[00:16:11] Yeah. And mutation just include that because mutation testing can tell you that say to a specific confidence, but it's not perfect. Can tell you, is there something else to edit things and stuff?

[00:16:22] So is it fair to say that given your mutation testing engine, if something passes the mutation test, then the functionality of the program is exactly what's necessary.

[00:16:36] It's exactly what specified, okay. Yeah, exactly what there's like, like a one-to-one mapping in the sense that...

[00:16:42] it really picks a many different, so, because imitating just the engine doesn't understand sanitation, or is it intentional? Just looks for simplifications, which still pass a test and reports these under the assumption that this simplification is something really interesting.

[00:17:00] I suppose that you can remove something, but let's say if you have two codes, let's say if you have a message, which does rich and that he was ignored. First call is deliver confirmation message. Second call is charged your credit card. So if Newton can take, deliver a confirmation message.

[00:17:17] Then that's a really important information because right now you could argue, okay. So obviously it's not important because it has no test. No, actually there was a really important missing test. And what if the next developer just assumes the confirmation message has sent some grads and just takes it out.

[00:17:33] There's no test. And it was just a side effect or something else, whatever Zen regression. So what oppressing the tissue tests proofs as humans, imitation testing engine can reach all your code. Complex a little bit. Let's go regulation and no reported life mutations. It proves that your code manually so to a statistical confidence in Java, because we're just cannot see everything only implements what your tests ask for.

[00:18:02] It's really important because it's the engines know nothing about your intention,

[00:18:07] right? Yeah. Okay. Maybe it's sort of it's like an NFR non-functional requirement in a sense, like it's not a mutation. It's not going to tell you that your code is implemented. Right.

[00:18:19] So let's say in some cases, just an engine.

[00:18:22] So if you, if you had the mutation testing engine, which was. So let's say you only implemented the greatest equals function exactly for the, for the math, for the numbers, which are your tests. If a is one and B is tours, and I expect that and you implement a function like that mutation testing engine cannot detect that it's actually meant to be.

[00:18:45] Right. Not in Ruby, we are talking about languages yet. So if you talk about languages with type systems, there it's much harder to cheat because then the type of signatures actually force you to do a certain degree of. So if you, if you have a type power meter, implement the classes, talk about guns at case they are implemented under the predicate for all.

[00:19:06] So you cannot make up your literacy in the function connotation to do the static implementation. It's impossible, but the dilemma language, you could easily shoot a mutation testing engineer. As a test case, you will just have codes, inputs on the tests. There's nothing I can do to risk mutation, just as options to present that will be property testing, which is another alternative.

[00:19:27] Another way off. So coach testing takes a, takes. A generator is a predicate as an input and generate values against the predicate and tries to translate predicate into a direction environments. Right? So it's a really rough explanation, but I'm saying years it's all about. Findings and rights set of tests, which offsets the inherent weaknesses of your development environment.

[00:19:52] If you choose a dynamic language, there is nothing preventing you from doing. For drinks, something really, really, really, really bad in your coat, right? That's the tests. The problem is there are laws that constraints you need to make sure that your tests have a much higher quality. You can say quality, but actually yeah, your code and test them much more in line then type language type language you can, you can, if you, if you, if you go for one was a very sophisticated type system for tests.

[00:20:23] But you cannot replace even if you use dependent type systems or replace all tests, but you can replace lots of boring tests.

[00:20:34] Yeah. You sort of like in a dynamic language like Ruby or JavaScript, you have to shift a lot of responsibility onto your test suite because that's the only way you have to verify functionality.

[00:20:46] The problem is that your test suite without mutation testing, Can never guarantee that there isn't any semantic in your court, which couldn't be reduced. Right. It's impossible. So let's go back to the example with a first line of code, send confirmation message. Second line of code charge credit cards. So the problem here is if you only cover the transfer credit card test, it is still 100% line.

[00:21:14] Because it's a message sent the confirmation, you met extra rent. Yeah. So the only way this effect is actually covered in tests is to remove it, to run the test. So, and now it becomes a little bit complex because in reality, I worked on projects with Sean single-threaded execution, which would run tests for 60 minutes.

[00:21:34] So I'm just say, Hey, I have 10,000 lines of code and I find each lines, there was one mutation and I need to run in 60 minutes, obviously not. So it's a little bit more sophisticated. So you try to fight the right thing. Test. Forgiving mutation and it goes a little bit more deeper. There are some pre-conditions on mutation testing, like a test week, which can be targeted to only execute a specific test that can test which we can be introspective.

[00:21:57] It can be a map from, okay, this test is probably responsible for that language, which allows a little bit of culture racing. So you can substance this convention based on. A little bit more. So it's not, it's not something you can just meet, you put on a test and just assume which would produce good results, but it's the constraints of making a test mutation testing capable, also lead to a test, better test suite in itself because you want to test which tests send you what your unit has to be.

[00:22:29] Actually as a suitability unit, if you have a unit test, they'll kind of point to the class or public interface on a class, which is being tested. So it's not a unit test. So. Adding imitation testing book flow to an existing test suite typically improves the tests for the people who do never runs mutation testing, truly doesn't right.

[00:22:49] Right.

[00:22:50] So one of the things that I want to pull on a little bit here is when you introduce yourself, you mentioned what was it? Two. Sort of

[00:22:59] language exercises?

[00:23:01] No, that I did really like that one, but it was something else. Something about the sort of I think it was something you maybe mentioned way before, which was like a metrics driven development or some kind of like stuff.

[00:23:13] So, so what I want to follow up on is how did you know. At some point in your career, you were writing code and you uncovered the use case for a mutation testing. You figured out what it meant you added, you know, you adopted mutation testing into your workflow

[00:23:33] origin story, because I obviously didn't amendment testing.

[00:23:36] I didn't invent mutation testing for Ruby. I invented mutation testing to my liking for Ruby and to other people's lacking, which originated in the fact that I once was a core contributor on data map of one. Well, it's a core contributor to project to project, which got disbanded. And the idea went onto Ben's Roman dry up here, which I'm not related to at all and just falling foster originally, but not the actual implementation projects.

[00:24:03] But what's interesting is that at the time we were planning data to den cup was the lead managing of the data, not a one road relationship algebra engine context. Then this relational algebra engine was tested. Only mutation testing tool available for Ruby's at time, it was called hecho and I tried to contribute to that one and I hit my usability problems and it was constantly bitching, blah, blah, blah, blah, blah.

[00:24:30] And everybody got a little bit pissed on me. I was like, I was really young, 19 years, whatever. So. And some fun people. And it was like, Hey, let's try, go into a 10, 10 year old 10 year journey. But at the same time I was getting really good.

[00:24:53] Of my life because he's, he's really capable to not get clouded by his own judgment on the court. So he wouldn't just give overlaid. He saw it all the time for the sprints again, and lots of simplifications in my court. So two things happened. I was, he was using hecho for axial. I wanted to contribute to Axiom and its plugins.

[00:25:12] And. Henkel was mandatory. I had claims ability problems was Hector and I got reviewed by him. And eventually I found out that that's actually the same thing. So a large part of it. We'll be covered by getting the court on the test free of reported mutations. Curtis is they're intersected a lot, and eventually I ended up working with them, understand commercial team, and we could actually trace back.

[00:25:38] Several kinds of bugs in productions to flex, which were missed in the code two missing mutations, which have, which have substituted flex. So we couldn't explain the catalog of mutations. And in the end we actually could throw out lots of mutations because they didn't follow the strong reduction principle, semantic reduction principle, and made them sound kind of.

[00:26:02] Automated first that the crude review. So over time we could actually say, I don't need to look at this code as long as they are alive mutations, because they will flex in any way. So we can transform of workflow in a way that we say, okay, so we broke up mutant NCI incremental note. So it only targets automatic.

[00:26:22] Can you only targets change to the current iteration? Because the full run would run a long time off. Was the CIA. And which was very low, like 60 seconds

[00:26:34] and a policy could be. If mutant tries modifications, which are equivalent to flex, you would have placed on buggy code reviews, which ends up productions. Transversely. Why even look at code with mutations. He's just sent it back to the also and say, Hey, fix this mutation. And this is how this actually entered.

[00:26:59] And both refined all the time. Each time we end up with something where we had a backend production and we've traced the thing that decisions for chefy made. In many cases, we could find a missing mutation operator and we would implement it and use it and just run the sanitizing again, verifies that the same trap.

[00:27:16] And this is this whole, this entire. Idea of trying to approach engineering a little bit more scientific community, because you can try to trace back your, your decisions into axioms on the development team. So let's say so I drew the decision to single quotes, but if you're always use the least powerful premiere different Justin.

[00:27:40] Let's say it's a universal accident. You can, you can agree to it. If you do not need to read the variable, also have a message and use a local variable. If you need to read it all, set up your messages in the same class instance, use an instance all of power. And if you need to, which is a very bad idea that need to read and write at a level higher and you use a global

[00:28:02] So even worst case you write into the database because it's global and

[00:28:06] your projects don't even give people these ideas, man, don't even, don't even talk about such crazy.

[00:28:12] So I'm just, I'm just saying the important thing is you always use the least power for primitive for every decision. It's one of the core principles, and then you can derive a lot of decisions from that and stop.

[00:28:26] Because if you're arguing to other courts was a single course, but simple if you're across, because if you use the other course, it's the most, it's a more powerful tool than you actually can argue the same. You can argue the same for which data types. If there's a variable that you can only have to state, it's probably an empty string or string as a one.

[00:28:50] Right? So that's. Engineering decisions, which get an also discussion risk, which gets removed. If you get your team on and teach your team on how to apply down the chain. And in the, in the end, we could even match up these exosomes because it follows the composition of the least powerful. And that's it, which calls to as a, it's more complex than the messaging, going back to our example of receipt confirmation.

[00:29:19] So it's just, let's say it's a constant world of decision-making. I discovered in wide-reaching mutants, which leads. Trying to software engineering into a little bit more determined. Isn't so obvious utopia is that developers with the same inputs, the same requirements, Bitwise, identical. It never ends, but you can approximate.

[00:29:42] I actually prefer to use, even if it's flawed, you know, it would never happen. But if you follow a decision making tree and a series of software development, which can approach. You have lots of side effects. I was working in remote teams for 10 years now. And also, which means that I cannot always tell people on how I want to do things on interject, watch them and interrupt them and drink and drink, find rent engineering.

[00:30:09] But we ended up in a, but if you do the same decision tree, it's very likely that we end up with something. So uncontroversial, is it a code just fits together? Yeah. So it's more statistic argument, right? Coach should reduce the likelihood of. Which are based on personal bias, because I think if there is no, if you share this exhale and you swallow your own technical thought leader, pride just swallow the Axiom, like, okay, always use the least power for privileges so much.

[00:30:40] Justin goes away. And if you get a buy in from Nash parts of the teams, their style doesn't exist anymore.

[00:30:50] It's great because is just in my opinion, it's this perspective. I developed profile writing new agenda and lots of other development tools. That style is just an excuse for, I don't have any arguments anymore.

[00:31:02] It's just subjective when it comes down to it, right? It's like, what is better than this and whatever.

[00:31:07] All right, let's take a quick break from today's episode so I can tell you about my other software delivery resources. First I'm opening up my own software delivery dojo. My dojo is a four week program designed to level up your skills, building, deploying, and operating and production systems. Each week, participants will go through theoretical and practical exercises led by me designed to hone the skills needed for a continuous delivery.

[00:31:32] I'm offering this dojo at an amazingly affordable price to small badges. Listeners spots are limited though. So apply now as software delivery, dojo.com. Well, if you want something free instead, I've got you there to find links to my free email courses and eBooks on any show notes, page my courses and eBooks cover topics in much more depth than I can cover on the podcast.

[00:31:54] They're great on their own, or even as a useful compliment to topics covered on the show. Find all of my free resources at smallbatches.fm. All right, let's get back into the episode.

[00:32:06] So from this Axiom, it powerful primitive. You need to do the job. You can derive a lot of things you can talk about, which kind of SQL to you use.

[00:32:15] You can talk about which kind of delimiter you use. You can talk about. Two fields. And if you have three instance variables, you will need to initialize them. And you do not know in which order, and it makes it magically in difference than you alphabetize it just because there is no order. Doesn't mean you do not need to define what, so there's a default your team can agree to.

[00:32:36] So there's a lot of barriers in codes you can just remove. And this is one tool just fits into that sub model. So my entire you on development got a little bit shaped by. It's not an art, it's an engineering discipline. Okay. We, you all agree on that, but we have some probably universal. Universal ideas and crucial, all agree on.

[00:33:00] I'm not stating that the one I just uttered here is the one we should look beyond, but there is something in that area that if we brought to constantly agree on the entire world of software engineering would become more self-consistent. So for example, if you go look at the Medicare, there's a double blind study.

[00:33:15] So if you do any kind of. Medical triage. It needs to be a double-blind study because they figured if even a doctor knows what's going on, there is influence and so on, so forth. So you need to have the participants of the study, not knowing if they get the actual trial medical meant or saying this thing as a placebo and you need the doctors, they do not need to know.

[00:33:36] And then you have a, then you have a way to correctly everyone's results. And. Is this almost extra magic. I think that it breaks down from other simple principles of medicine. I don't know I'm with a medical professional, there are principles that exist in software engineering, which if adopted by large enough group have big advantages.

[00:33:56] It's the principle of least primitive perimeter.

[00:34:03] Right? I recently had an interview and the guest brought up kind of the same thread that you're bringing up, which is that he mentioned that he had to transition earlier in his career where he, he called himself a software engineer, but he realized he was just a software developer because, you know, he was.

[00:34:19] You know, writing code kind of just guessing on what to do and not necessarily using any kind of what he called scientific rationalism to decide what to do next. And I hear kind of the same theme coming from you and you're like in your career progression.

[00:34:37] Very this team of datum at the one and left by then very early.

[00:34:41] So I was introduced to this ideas at age 20 ones, and it was consumed by consulting for seven years. We are staying with the same team for seven years for the same client, which is quite around. And we had to eat all almost tastes, but also refined.

[00:34:56] Significant. Yeah. So yeah, like adding mutation testing was one of these techniques to improve the, like the metrics that you use to assess the quality of the software

[00:35:08] that you, we talked about.

[00:35:10] So-so improves the checks. It's just about trading a pipeline, which deterministically produces. Mutation testing is just one of these checks and it shifted load from the most expensive resource, which is, which is human time salaries. So salaries as the highest priced way, because it costs the most. As you are running the day-to-day operations and AWS and all the stuff can cross it off, but that's just from the pure engineering points.

[00:35:37] The human time cost in software engineering. If you buy rockets, but if you do software engineering, it's a human time and we want to spend all time. On human rusty problems and verifying low level properties of this court is irreducible relative to the test is a neutral human bias. It's an axial. We spend human time on it's a room of maybe not the next Axiom's a little bit more transient.

[00:36:06] Let's say it's another route. Not spent human time on something which is measuring. We do not one test, right? Yeah. We do not run replaced a five from development with writing some basic unit tests because you have to run themselves those times after each change. So obviously we don't do this by hand. So why shouldn't we go ahead and use techniques, which shift verification of low-level properties like do I actually have a test which covers this bloody confirmation is sent to choose a machine.

[00:36:36] This is what we want. So the test or the other way round, let's talk about another classic example in Ruby, you have two kinds of hash accesses. You can either use the scrub, which returns the default value in case there was no key, which you can actually pass up and it will. Okay. So if I wrote. Access and knows the key exists then using a code pass, which could return the hash default is actually not desired because I know the key and Gerald lots of cold sites where you actually know this should exist.

[00:37:20] And if you use this crab, I could operate here. If for some reasons or some factoring failure, you will propagate the default, the default value, which is near you will be propagated somewhere as exception. And you have to painfully trace it down to

[00:37:38] force you to fetch. It's out of arguments, obviously, because Ms. Oregon's is totally different cases. Also, if you didn't notice that it's different fetch method, unless you can demonstrate that you actually have actually demonstrate that you really use the default value in one case. So for example, if you pass, it took a couple of parallel sessions and read.

[00:37:58] I know it's actually a controller power meters, but it has the same interface. Roughly the same semantics. Then when you write a test for such a case, it's very likely that you need to handle the cases of permit. And by forcing you to go, it's like an extra level court review. The case that you didn't testify that was lying in the contact form to be absent,

[00:38:29] or do you want to change it to veterans your opinion? So either we change it to fetch or improve me, it cannot be changed. Right? That's what I was doing. So I don't know the whole, I ended up here,

[00:38:40] so that, okay. One of the things that you keep bringing up with regard to mutation testing is playing the role of the automated corner of your is something that I can attest to also as someone who spent a lot of time, Doing code reviews on like code that I was familiar with till that I wasn't familiar with code written by new developers, senior developers, whatever

[00:39:02] exactly you can.

[00:39:04] That's the key. So you can use mutation testing, pre COVID on your own. You can use mutation testing on CCI, or if there is no policy to mutation testing, you know, set up whatever you can run it locally during. And just ask yourself instead, rusty to teach the new developers, or is it not? It's a point yet, but you can ask, you can remove lots of Lola the questions on the code for.

[00:39:27] And when you learn the properties of mutation test, what kind of reports are not, you don't need to check these Lola digits anymore because you can focus on the higher level. Did he use the wrong common strings? Correct. Does it fit in your business perspective? So it's just a low level test, which shifts.

[00:39:49] Where are you with timeframes, you experienced with your true machine? Right?

[00:39:53] So that was the point I was making that I, as the reviewer was doing these kinds of things manually, which was, oh, I see. Like I'm reviewing this code. I see that you use square brackets instead of. Y, you know, like, do you expect it to be here or not?

[00:40:06] Like, because I see this now I have to have this discussion with you to verify the semantics of what is written. Is it correct? Is it not right?

[00:40:15] Exactly. If you teach a new welcome, you can, on the first time you see it by hand, you take your teacher, tell him who's fetch and you explained the semantic difference and so on.

[00:40:24] But. It takes time for the human brains have inertia. He learns the scrubber, come back with some credit malicious. Right. But the nice thing is see, the mutation tastes as a constant buyer. Doesn't have a bedtime. So over time you can just tell the people. Just get a pass, just get your questions past imitation.

[00:40:44] If you have any problem, killing them, come to me. I help you. So it's like a ladder for the people. You can just put these tools. If you just try for five minutes, if not the mutation testing system to never forget to check for square brackets on this call again, and maybe you'll be forgetting the next time, because I just taught you.

[00:41:02] We are all fellow. It's like a minimum threshold of reproducibility, which can get constantly.

[00:41:08] So you mentioned the term ladder, which I really like here. So for you adopting mutation testing was one step on the ladder. What were the next steps on this ladder towards like a higher level?

[00:41:22] And I got tired of many reasons.

[00:41:25] And one of the reasons was this problem of implementations, which just met inputs to outputs without actually implementing, which I mentioned before. This was something interesting. So let's go for. So you have a, you have a test, which Astro is two bigger than three. The answer is no is three because in two answers.

[00:41:45] Yes. And if you wrote to write these two statements into the programming language, so if the left side is in the input is one and the right side is the left side is three to whatever. Cause it's the stable. And there was nothing to mutation testing engine can do about it because of tissue testing engine will try to reduce the mapping from a three and two of the, to, into the test.

[00:42:08] Okay. I can't do anything, but if you have a type language, you can do something about it, especially if you haven't typed language, which separates the concepts of Rachel Zen into a separate so-called, let's say type class. You can say this message must be implemented for war for all. Which support relational operators, someone, because he is not a conference type.

[00:42:33] It can be integrated, double the floor natural, whatever you cannot just have. You cannot hardcore such a table because there is no a few existed, but directionally zero sum, it's much harder to half court. The subset you choose to test. And treated implementation, which just fulfills the test examples, but not, not intention or feeling fulfilling or examples.

[00:42:57] And if you have a type system with universal quantification, so as these for all types, then you can actually enforce these six. And so this is where I went to. So I went to, in my opinion, most production level production, ready language in which this types, which is Tesco. So as are many other languages.

[00:43:12] And I probably.

[00:43:15] Very strong opinions are fine. I have them too.

[00:43:19] And try to progress from, from just example by. Checks to universality so I can prove things for all too easy type system. So then I started to use a lot of project based testing, which is prevalent in high zone. Really good tools to use quick check, knowing this hedgehog.

[00:43:38] Then I started to realize that, okay, so these, the writing executable. Which has lots of checks and it's verified and so on is a good idea, but I always have to draft the database and the database is already a little bit of customers we have, right. For service and function, all the stuff. And you all know you are not supposed to put the logic in the database, but nobody expenses reasons because the reasons it's hard to test.

[00:44:03] So I sent out and enrolled the test framework for. Which meant that this test from exam there are existing. Just the ones I've found suffer from the big problem. So let's say you're just a relation, just a big giant against some future data will have. So the expectation actually has to manually repeat in the host language, the entire results.

[00:44:24] So you join addresses against you. Join test, you're join your shipments, which have an FK to customers. You joins them against the customers and you want to have an output and. And then you'll painfully need to write if you use the host language, Ruby has been painfully writing, writing expectation, and the big project with 20,000 lines of SQL, any kind of new column addition, it has a serious ripple effect on typing.

[00:44:52] You will not be able to maintain this kind of. So I, I, and then I eventually, for other reasons to try to do some Postgres add extensions and every nice way of testing, they use golden tests. Recording your session writing to a file, like a session, writing a two or five executing the same thing again as a test.

[00:45:15] So there was, there was an SQL statement and expected, expected output. And the test is just to run the statement again and compare it to the open. If there's any difference in test fails, it's a golden test and it works really well to test. 'cause you, can you have your fixtures to just load them to the database?

[00:45:32] You have the select percent complex joined as well, self dependent with lots of logic, domain specific logic. It just counts, whatever. And then you just record the expected outcome. You verify once by hand, as you need to, as to crabby, and every time you change the crabby, you will see in the diff presented by as you go and testing against the expectations.

[00:45:54] Because maybe you want to include certain customers. And it's a really bad description of a golden test because I'm not used to explain yet because I just recently, so I have a nice, nice, expensive, better. I'm used to type, I do not talk.

[00:46:14] I recently discovered that I had a blind eye on the database and then you had the black eyes. So I couldn't put as much logic to the database. We just should have done because in certain domains it's much better to push it on logic to the database, because let's say terrific. A case you compute the total from a cart.

[00:46:32] It's quite stupid, but we do lots of objects back and do some trivial aggregations to.

[00:46:40] If you have a bed or M U of M plus bond problems, you have all this mapping overhead and all this stuff. But if you could just teach the database, you could use the total. You could just read the total from the database, the database, which have one statement, the transaction context stays on for one statement, it's it's much more efficient.

[00:46:59] So I actually transformed over the 17 years as one of the single transformed e-commerce system called sprint was a pattern. And actually we really rebuilt the entire application. So the pointer is so obvious. You need to test this things and then you need to have one to have the service logic. You need to have these goals.

[00:47:18] Because you cannot have lots of expectations against intermediate relations. You also need to test. So if you have a real online shop is really discounts, which are, which are time-based in all the special cases you need, you cannot just test against the totals because you need to also test against the mediary views.

[00:47:35] Right? Because. So easy to create a buck, they result in the same total for different reasons. So you need to orthotist intermediates, but testing large intermediaries is really hard if you have to hands right. The expectations and there's a. I have a physical test. You can just say records the session right into the expectation file.

[00:47:55] So you would have known it once to type it all. And next time I changed the code, it would present a diff. Okay. So for example, the total change from 100 to 200 and the intermediate relations, there's this new column. Do you want to accept this change or is it unintended? I'm going to say yes. It's a five is expectations.

[00:48:12] Golden simplify gets overwritten and I don't have nothing to type. I just need to very carefully. Hm. Okay.

[00:48:19] So is it safe to say that your kind of progression here is just exploring a lot of different ways for different kinds of testing? It seems like a lot

[00:48:29] of this, it's not about testing. It's about improving the confidence in my development.

[00:48:35] What's the right tool for the, for the, for the area invoking. Right.

[00:48:40] That makes sense.

[00:48:41] Is it because of the absence of a really good type system in SQL, you can't have four walls you can't. So PostgreSQL is it's really good RDBMS, but it's type system is lacking. So I obviously do backfill tests in some way, has skilled, has a good type system.

[00:48:55] So I do not write as many. Ruby has no types of teams. So I write a mutation test against logic. The logic is basically, and some weaknesses and the truths I use and I try to fill the,

[00:49:08] yeah, right. So that's sort of an, a mirror, something in like my own progression, which is like, yes, it's not, the goal is not the right test.

[00:49:17] The goal is to increase confidence that things are working as expected

[00:49:22] competence with. Yeah. Trust confidence and emotion about determine doesn't miss a little bit more. It hasn't been a better time. It can mean if you want to have development process, which deterministically adds value to the product.

[00:49:35] Well, put, I think that's just where we should leave it because that's sort of the high level goal of all the engineering work we do is. Do it determined.

[00:49:45] And so I'm also an entrepreneur. So then if you talk to the independent lockers, it's different. So, yeah, but if you just focus engine it's about run a process, which increases you can also can also put the university, which is avoid the importance of human discipline in your process because human discipline by research all the time and the

[00:50:06] failure.

[00:50:07] Well put, so. Thanks for coming on the show. It was so much fun to talk to you again. I haven't talked to you for a long time. I failed. We could probably talk for a long, long time, but listening to me, it's the first time I actually had the two, an outside of my, of my peer group at the time to explain, oh

[00:50:28] my pleasure.

[00:50:28] So is there anything you'd like to leave the audience with before we go?

[00:50:32] Yes. So if you're on Ruby and you're interested in using an extra magic determined, isn't improving whatever tool mutant used to be open source. I had to pull it from the open source. That's an interesting story, which doesn't fit here right now.

[00:50:45] If you want to try it out, fight me on Twitters. I'm happy to give demos. It's free to use fault and source, but you have to sign up. I need to track who is using all of interest and just to make sure that I have a reason to make a distinction from commercial usage. So feel free to hit me up and ask questions.

[00:51:04] I love questions. Love to talk about.

[00:51:05] Yeah. As you can tell. So for the listener, if you want to find out more about mutant or the things that Marcus is working on, just go to small batches.fm, there'll be links to everything that Marcus just talked about. If you want to check out the. And what's that we say I do.

[00:51:20] You've just finished another episode of small batches podcast on building a high performance software delivery organization for more information, and to subscribe to this podcast, go to small batches dot FX. I hope to have you back again for the next episode. So until then, happy shipping,

[00:51:41] like the sound of small batches. This episode was produced by pods worth media. That's podsworth.com.

Creators and Guests

Markus Schirp
Guest
Markus Schirp
Master of Disaster. Mutation testing. Dynlang Exorcist. Expat German.
Developer Principles with Markus Schirp
Broadcast by