“IT outage” first brings to mind for many what happens when a million subscribers can’t see the Netflix movie they had in mind, or New York Times staffers are unable to update their stories for several hours. That was the emphasis of last week’s “Eliminate Outages: the engineering challenge“.
In certain industries, though, information technology (IT) is much closer to being the matter of “life and death” that’s otherwise just a cliché: when computers go off-line in air transport, chemical refineries, hospitals or utilities, lives truly are put at immediate risk.
Medical IT presents particularly dramatic cases. Patient care is an enormous business that touches nearly every household, it’s highly-decentralized and poorly standardized, it’s in the midst of upheaval as organizations lurch toward mandated electronic health record (EHR) technologies, and too many outages crippling institutions ten days (!) and occasionally more have already happened.
Perfection is not the answer. Tempting slogans like “zero down-time!” or “Always Available” are as much a problem as a solution, to the extent that they distract from realistic, sustainable answers. In aviation, for instance, leading-edge fly-by-wire (FBW) controls can be documented to have been a cause of crashes, at the same time as their overall effect is to improve safety. Similar results are believed to be true in medicine: an EHR system that goes off-line (at least in part) for a day leaves medical emergencies in its wake–but the net effect of EHR appears to be to improve medical care and efficiency. Certainly medication and other errors are common with paper-based hospital workflows.
The effective way forward is to define appropriate but realistic requirements for IT systems–not “instant-on!”, for example, but responses within two seconds. Balanced criteria afford IT teams opportunities to design and implement real-world solutions.
One promising technology for high-quality EHR delivery is application performance management (APM). APM already has proven successes in measuring, monitoring and controlling actual end-user experience (EUE) of the doctors, nurses and ancillary staff in EHR-adopting institutions. While APM does’t eliminate downtime on its own, it’s a cost-effective tool for managing and improving system-wide performance.