Queuing Theory
Queueing theory is a field of mathematics that describes waiting lines. The math models how long it takes to get through the queue by relating how work arrives at the queue and how work is processed.
Don’t balk at the math. This applies directly software delivery because the work we do, and the deploys themselves, are all part of queues. Understanding queuing theory provides insight into the systems we work in.
Let’s consider some example queues.
Imagine you’re ready check out at the grocery store. You see different lines of people waiting for different registers. Which do you choose? Probably the one without the family with an overloaded cart and a handful of coupons. How about the line with a single person holding a single six pack. That feels better. Cross your fingers that you guessed right.
This example shows two important variables in queueing theory. The first is how many servers are pulling items from the queue. There’s just one in this case, each with their own queue. The second is the variability in how long it takes for the server to do the work.
We’ve all been stuck behind that one customer who has problems with the register or needs a special discount. This demonstrates variability’s impact on overall performance. The result is that some lines will be much faster than others since a problem checking out one customer impacts all customers waiting for that register.
Let’s consider another example.
Imagine you’re waiting for airport check-in one of this long lines that snake through the concourse. There’s a handful of agents taking the next available passenger. No choice here, you just wait in line. The airport workers notice the line is getting too long so they add more agents. Next thing you know, you’re checked-in and onto security.
This example demonstrates how adding more servers, thus parallelizing the work, increases overall performance. We already have a term for this. We call it "horizontal scaling". This reduces variability’s impact on overall performance because if one passenger requires a longer check-in, the next passenger goes to the next available agent. The queue keeps moving.
One last example that’s common in product development.
Imagine you’re leading a team. You don’t have enough people in the team to handle the tasks filling the backlog. You want to get the most work done, so you ensure each team member always has assigned work. New work keeps coming in. Work piles up. It takes longer and longer to finish anything. Eventually burnout sets in.
This last example demonstrates capacity utilization’s impact on throughput. If capacity utilization is too high, then there’s no overhead for variability in processing time. This is certainly true in product development when every JIRA ticket feels like a bespoke piece of work.
The variability does not even have to come from the work itself. Inevitably something will interrupt the work or unplanned work will be injected in the queue. Pagers buzz. Bug reports come in. Someone’s machine is broken. All of these are common in IT.
This brings up another important variable in queueing theory. It’s simply how long queue is. The team in this example needs to tackle two ends of the problem to exit the death spiral. One end is reducing items in the backlog. The other end is reducing the work load on each team member to account for the variation in each ticket and the inevitable unplanned work.
These are not anecdotal suggestions. These are truths found in queuing theory.
Kingman’s formula approximates the average waiting time in a queue. It demonstrates the negative impact of high utilization on wait times. Wait times are reasonable under 60% utilization but starts doubling after that. This is because there’s no free capacity to handle variability in processing times.
Now you may be thinking: "OK Adam, I’ll focus on standardizing work to reduce variability in processing time and keep capacity utilization down. Problem solved".
Almost. That only accounts for work after it has arrived at the queue. How work arrives at the queue is another variable in queueing theory.
In our case, the work is random and independent. The specific math changes based on that assumption—say a first-in-first-out vs last-in-first-out queue. Luckily there’s a law that applies to queues irrespective to these assumption.
Little’s Law states that wait time is equal to queue size divided by processing rate. The remarkable thing about Little’s law is that wait time only depends on queue size and processing rate—nothing else matters.
We can control processing rate by adjusting the number of servers and tackling variability. Or, we could tackle the numerator instead and focus on reducing queue size. That has larger impact on wait times and applies irrespective to tuning the processing rate.
Queuing theory leaves us with two important lessons that apply directly to knowledge work.
Lesson one: loading people and resources to 100% creates exponential wait times.
Lesson two: prioritize keeping queue size down since that reduces utilization and has a larger impact on throughput than addressing processing rate.
There are also a few rules of thumb for applying these lessons.
Focus on keeping work-in-progress (or WIP) low. WIP limits acts as a backstop for limiting capacity utilization.
Strive for an optimal batch size. Hint: small batch is better, but find the optimal size for your system. There is such a thing as too small of batch size.
If you’d like to learn more about queuing theory and its impact on software delivery then I recommend the book Principles of Product Development Flow by Donald Reinersten. Or, if you want something less technical than I recommend Making Work Visible by Dominica DeGrandis.