Monte Carlo Simulations
for Capacity Planning
When we estimate software stories, we decompose the problem into questions that we can reason about independently, and express our uncertainty about those questions in ranges. The range expresses our degree of uncertainty about the effort required. We can reason about an individual software story, but we can’t reason about all of the possible outcomes of an array of dependent stories needed together for a targeted software release. Monte Carlo simulations allow us to delegate that aspect of complexity to some very bright math.
When we use discreet estimates for stories, then it’s a simple matter to do rollups to produce an estimate for the delivery for a collection of stories. Say we have stories that are estimated as follows:
The roll-up is 42 hours. Simple to calculate, and likely somewhere in the ballpark, but there are a couple of problems here. First, precision is deceptive. It signals that analysis over, and worse, it encourages others, from scrum masters to stakeholders to take the number as a target.
Range estimates help keep everyone honest, by allowing us to express uncertainty in the range spread. For the same stories, maybe we’ll get range estimates something like this:
Now our roll-up is not very straight-forward. What if all the stories come in high? Alternatively, if they all come in low except the biggest one, which comes in over the high end? It’s more realistic but too complicated to model outcomes in a deterministic way.
Also, consider that often the estimate of one story depends on the outcome of another; how do you take that into account? How does that affect an array of possible outcomes?
Now consider that instead of having four stories (or any components of work) independently estimated, you have twenty or more to contend with. This is closer to the reality of what we need to model if we want to understand capacity planning. Monte Carlos gives us the means to incorporate an array of range inputs and calculate probable outcomes.
The advantage of this approach is that it allows us to focus on what we can know, and we don’t have to make assumptions about what we don’t know. We can reason about the factors that drive each estimate; we analyze the story, identify what we’re most uncertain about, answer questions that reduce our uncertainty, in an iterative process of refining the estimate until we arrive at an acceptable degree of uncertainty. That way, estimation is not relegated to guesswork, but a comprehensible process of analysis.
What we can’t reason about is all of the possible outcomes of all of the stories, all together. That’s the part we delegate to the Monte Carlo simulation. The range estimate of each story is our input, from which our simulation generates an array of 10,000 (or more) randomized possible outcomes, ordered by a normal distribution and then rendered in the form of a histogram of probable outcomes.
Here are the Monte Carlo results for our 4 story example above, based on 10,000 random scenarios:
If we have 40 stories instead of 4, it won’t be any harder to understand the results, and the effort to estimate the additional stories is merely linear. The exponential growth in complexity is subsumed in the model.
Now we can say the probable outcome is between 36 and 48 hours, without losing sight of the fact that 58 or more hours shouldn’t be entirely unexpected; this is an improvement over a discreet value roll-up of 42, in part because a common anti-pattern is for estimates to be adopted as a target.
Putting the development team under pressure to try to meet a target works against the Lean principle of a focus on quality, and usually doesn’t even serve the intended purpose of budgetary discipline. Working from range estimates provides a sufficient degree of accountability, holding the development team to financial constraints while they focus on building the right thing. Range estimation is an excellent tool to fight back against the malpractices of estimate targets.
Before I came here I was confused about this subject. Having listened to your lecture I am still confused. But on a higher level.
— Enrico Fermi
A Monte Carlo simulation improves our process, in that where our discreet value target estimate is relatively impervious to new information, our Monte Carlo model is a veritable invitation to update based on new information. The automatic recalculation of the distribution of probable outcomes gives project managers the tools to move beyond the fixation on a target.
The ease of recalculation encourages the practice of an iterative hunt for the most significant risks. Risk means uncertainty, so mitigation of risk is a reduction of uncertainty. The reduction of uncertainty is expressed as a smaller range estimate, which in turn produces a tighter distribution of probable results. Thus, the Monte Carlo Simulation is not merely a way to model probable outcome, but a tool in the process of improving outcomes.
Keep in mind that capacity planning is not release planning. Capacity planning is effort estimation, but we can’t just paste it into a calendar and expect that to serve as a delivery schedule. Effort estimation is a good start, but we need to incorporate queue time into our model to use estimates to manage stakeholder expectations.
Let's agree to define productivity in terms of throughput. We can debate the meaning of productivity in terms of additional measurements of the business value of delivered work, but as Eliyahu Goldratt pointed out in his critique of the Balanced Scorecard, there is a virtue in simplicity. Throughput doesn’t answer all our questions about business value, but it is a sufficient metric for the context of evaluating the relationship of practices with productivity.