Drift into Failure
You’re listening to this podcast because you want to improve. You want to become a better developer, manager, or leader. This podcast is a great start, but now I have something to help you find excellence. This is the official Small Batches Way study guide!
The study guide charts a path to software delivery excellency from the best books, ideas, and practices. The path is four parts: Understanding of TDD, Understanding of software architecture, understanding of production operations, and understanding continuous delivery.
Get it for FREE at TheSmallBatchesWay.com.
Hello and welcome to Small Batches with me Adam Hawkins. In each episode, I share a small batch of the theory and practices behind software delivery excellency.
Topics include DevOps, lean, continuous delivery, and conversations with industry leaders. Now, let’s begin today’s episode.
A friend recommended I read Drift into Failure by Sidney Dekker. I really enjoyed reading this one.
The book is a wonderful introduction to systems thinking. Now my favorite quotes on systems thinking come from this book. That’s surprising upset to The Fifth Discipline and The New Economics.
The premises is that five factors: production pressure, decrementalism, initial conditions, unruly technology, and protective structures contribute a slow drift into failure.
Dekker explains drift into failure with stories. This is my favorite think about the book and why I had fun reading it.
The stories were not about software, but that did not make them any less relatable. If you’ve watched the HBO series Chernobyle, then you know what I’m talking about.
Each story is an opportunity to read drift into the story. Dekker’s core message is that we need to think up and out, not down and in.
“Down and in” refers to cause-and-effect thinking. This applies nicely to closed-loop systems. This mental model looks for linear causality and broken parts that caused incidents.
This mental model does not grok with today’s complex systems.
Consider the Challenger space shuttle disaster. The space shuttle exploded shortly after launch because a piece of foam insulation struck the left wing damaging the thermal protection system.
It’s tempting to say the small foam piece that hit the wing so-called “caused” that disaster. That’s a reductionist view. We must go up and out from if there’s a chance of learning.
Here’s one question that comes from that line of thinking: What led to the foam warnings being reclassified from flight safety incidents to routine maintenance?
That classification change does a lot. It’s a step away from the original design into something else. The original engineers considered foam a serious problem. Now, it’s routine maintenance. That impacts the flight safety checklists. Those checks are protective structure aimed at ensuring successful flights. I’m sure the checklist was complete before Challenger took off, though that didn’t make it safe. It created the illusion of safety.
But why a classification change though? Well probably a few factors. Consider the feedback loop of prior successes. More means more. There were successful trips with foam as routine maintenance. No problems. It should work again in the future right? More means more.
Then of course there is the ever present production pressure. The public and bureaucrats want more shuttle flights. The shuttle was effectively sold to everyone as a space bus, so get that thing moving! Delivery and cost pressure all manifest downstream. You can easily see someone making management decision between safety and delivery. Like: “oh, we have bigger fish to fry! Launch is in two months. We’ll push maintenance later. If we don’t make this launch, then our program won’t be around for maintenance!”
These countless decisions all contribute to final outcome: a shuttle explosion and loss of the entire crew.
Taking this “up and out view” does not point a single cause. Instead, it’s a narrative for understanding not placing blame.
All right that’s all for this batch. Head over to https://SmallBatches.fm/92 for links for recommended self-study on systems thinking, safety, and ways to support the show.
I hope to have you back again for next episode. So until then, happy shipping!