Bottlenecks and flowmeters: models for performance management

In “The Hidden Bottleneck“, Bob Wescott invokes 19th-century agronomy to make the point that observation of a bottlenecked system mostly tells you about the bottleneck, not the whole system. This is eminently practical advice. The hazard Wescott himself, author of The Every Computer Performance Book, points out is that you might think that, if your downstream network connection throttles your site performance, then tripling the speed of that connection will triple your over-all performance.

This is a mistake. The most likely outcome of this situation is that you’ll discover the next bottleneck in your system.

Speaking with decision makers

Westcott thinks of it this way: if you promise your manager that you’re starting a project to improve performance, and you don’t account for the likelihood of multiple bottlenecks, “… you will have some explaining to do.”

Westcott adds that “… I’ve never seen a performance problem where there were more than two bottlenecks that had to be cleared up …”

As good as Westcott’s advice is, I also want devops with performance responsibilities to have a second model available for their analyses. Many applications look like pipelines; a Web application, for instance, might involve handoff from a front-end proxy to an application server to a database adapter to a database server to a load balancer and back out again. When a system with this many “moving parts” needs scaling up, the starting point I recommend is to measure peak throughput of each component in isolation.

This might require investment in new testing or staging assets. If that’s true, though, you probably need those testing facilities for plenty of other purposes. The immediate aim, though, is to construct a table, something on the order of

Segment Throughput, kilorequests per second
Front-end proxy 400
Application server 2
Database adapter 180
Database server 5

With measurements like this in hand, you can quickly read off that doubling performance probably can be done with fixes entirely in the appserver. Quadrupling throughput might take work with the database server, too. If your aim is a hundred times the throughput … well, you have work in multiple domains.

This kind of analysis doesn’t solve performance problems, of course. It does help you focus on what is necessary and what makes a difference. Instead of having to wait until hidden bottlenecks reveal themselves, you can start planning to widen them near the beginning of your project.