Confidence Interval
measuring against a known quantity
When we quantify our uncertainty by expressing our estimate as a range, we are saying that we’re confident that the actual value is likely to fall somewhere in that range. You might be wondering: what do we mean by confident?
We could say with high confidence that our task is going to take somewhere between a day and a year. We could say that with 100% confidence, and it would be helpful to no one. To be useful, your range has to be within striking distance of reality: your 90% confidence. Having the actual value occasionally fall above or below your stated range helps us to know that your estimates are useful. If nine out of ten times the actual value falls in your stated range, then we’ll call you a properly calibrated estimator.
Mostly when we think of calibration, we’re thinking tools. For me, 2001 was a year immersed in questions about tool calibration. I was working on aircraft maintenance planning software at the largest US regional air carrier, and in January that year, an Alaska Air MD-83 plunged into the Pacific Ocean off the coast from Santa Barbara, California. The rudder screw-jack jammed: loss of control was catastrophic, there were no survivors.
The investigation into the crash concluded that although the inspecting mechanic had estimated the screw jack’s remaining service life using a tool specifically designed for that purpose, the tool had not been calibrated before use. Calibration is a procedure where you measure against a known quantity so that when applied against an unknown, you can have confidence in the observation.
Bookies and insurance actuaries are professionals that tend to be inherently calibrated (or unemployed), the rest of us are subject to an array of bias which renders judgments based on expert opinion alone subject to wide swings of overconfidence and underconfidence, especially overconfidence, where the actual effort is underestimated.
Estimation calibration for people is a series of exercises designed to raise the individual’s awareness of how to make their statement of probability more accurately correspond to their actual uncertainty about arbitrary questions. Arbitrary questions are the known quantities we test against to identify over and underconfidence so that when you measure against an unknown, you can be more confident in your observation.
Let's agree to define productivity in terms of throughput. We can debate the meaning of productivity in terms of additional measurements of the business value of delivered work, but as Eliyahu Goldratt pointed out in his critique of the Balanced Scorecard, there is a virtue in simplicity. Throughput doesn’t answer all our questions about business value, but it is a sufficient metric for the context of evaluating the relationship of practices with productivity.