“Technical debt” in modern business

Software matters. It matters enough to be the focus of a good popularization such as Leo Kelion’s “Why banks are likely to face more software glitches in 2013“, from BBC News.

It’s more than just banks, it’s more than just British business, and 2013 certainly won’t see the last of “software glitches”. Take a few moments to consider the situation that so many “IT Ops” readers face.

How much does software matter?

A crucial piece of the background here is Nicholas Carr’s “IT Doesn’t Matter“, first published ten years ago, which described software as destined for commoditization: significant competitively, but not strategically. Whatever boundaries we eventually determine for Carr’s thesis, his point is important at least because it characterizes the way enterprises now see information technology (IT): software is a fungible utility. Kelion observes large-scale system errors affecting finance, such as the Knight Capital $400 million trading error, and rightly looks for an underlying explanation. He finds it in “horrifically complex business software”, resulting from “complicated modern computer systems”, “tough financial times [which] squeeze budgets and [make for] less effort spent on modernization and quality assurance”, “massive underinvestment in technology”, “software from third parties”, and industry-wide consolidation which results not in streamlined engineering, but only another layer of management piled on top of undigested complexity.

That does sound like a prescription for trouble, doesn’t it? Again, the symptoms extend far beyond the companies Kelion names. US telecomms providers, for instance, are notorious for the complexity and fragility of their billing systems, always in transition because of the continual turmoil of acquisition and divestiture in that sector. Just last week, I touched on systematic underinvestment in security and performance of IT through all industries.

There’s no particular “solution” for this problem. Strategically, many executives don’t perceive a problem. While it might be an annoyance to them that a slide or two claiming cost reductions from consolidation of acquired units’ operations are eventually proved fanciful, how often does responsibility for engineering performance or customer satisfaction determine top decision-makers’ reputations? However correct Kelion is about technical challenges, they all remain tactical. “Complexity”, for instance, is one of our frequent topics: performance management, done right, can no longer assume a simple local network topology, but needs to account for elements in the cloud, distribution networks, software-define networking, and a host of other variables that have become important over the last decade.

Technique isn’t the part holding us back

Modern application performance management (APM) is up to these challenges, to prevent application faults, given adequate resources. When security breaches, performance breakdowns, and mangled functionality are acceptable costs of business-as-usual, they persist; when engineers receive support to invest in clean-up of the “technical debt” Kelion aptly outlines, operations can improve considerably. If “we have to be prepared for further software failures …”, as Kelion concludes, it’s more because business is comfortable with the current level of quality in our systems than because of any intrinsic limits to engineering capability or our ability to craft correct software solutions.