Storage now dominates application performance

“[A]n efficient storage system is critical”, writes Gary Watson, Chief Techology Officer for Nexsan in his article, “Choosing the Right Storage Architecture for your Applications and Environment“. He’s right.

I particularly like “Choosing …” for its apt mastery of the details that illuminate today’s critical storage design questions. At a high level, application delivery depends on:

  • CPU;
  • memory;
  • network; and
  • mass storage.

Moore’s Law, and its relatives, govern the first two of these. We have good ways to virtualize and otherwise “commoditize” memory and CPU, and performance and price for these two close teammates advance predictably.

Network virtualization is only beginning. There’s a long path ahead in SDN (software-defined networking) and related concepts that will make networking more ninble in the coming years. For now, network management remains a disappointingly primitive, messy, and all-too-human craft.

Mass storage adds further complications. On one hand, storage virtualization is robust, with a healthy vendor market, plenty of knowledgeable commentators and an abundance of experienced consultants. On the other, storage is not like the other three domains: it adds persistence to use. If a motherboard or router goes out, we replace it, and don’t look back. The value of data storage, though, inheres not only in its use in today’s application sessions, but in the trust we have that all the data stored since commissioning remain available.

This makes the business of storage fundamentally different from the other areas. If I have questions about a memory chip, for instance, I can plug it into a testing unit and quickly quantify its performance–how fast data move to and from the chip, and its accuracy in reporting them. Mass storage needs to do all the same, but we also require it to report five years from now the data stored today. ┬áThat’s a qualitatively different responsibility whose validation remains controversial.

The consequence: far more than with other domains of computing, mass storage rests on reputation. That’s why storage margins are so high and erratic: in the absence of good engineering tests, consumers use price and appearance as proxies for reliability. This also explains a portion of storage’s relative conservatism: while it’s utterly clear that SSDs (solid-state drives) are overdue to take over for rotating spindles in thousands of datacenters where speed and power matter, we’re unconvinced, as an industry, how much we can trust SSDs over five or ten years of hard usage.

Virtualization is both a source of problems and partial solutions in the persistence dimension. As Watson accurately describes, today’s virtualized computing loads have considerably different patterns of storage than were common in the datacenter a decade ago: sequential access is relatively less frequent, and the read-vs.-write ratio shifts in application-specific ways. At the same time, virtualized storage makes it far easier to customize persistence solutions by disk type, RAID level, throughput, latency, and capacity. In the past, datacenters had to “take” the mass storage products offered them, mostly clustered around a few favored technical combinations. Today’s virtualized storage makes it entirely more practical to configure more efficient storage systems.

What does this mean for design and operation of individual datacenters? It puts a premium on knowing your own business. You need to analyze explicitly your usage patterns and requirements. Carefully distinguish the time scales at which your persistence operates–maybe you need 20,000 IOPS (input/output operations per second) for daily transactional loads, but it’s adequate to retrieve six-month-old archives at a languid 10 IOPS. Factor in physical space that constrains interconnect technologies, and disaster recovery, business continuity, or availability goals. Use SSD technology first as cache, where, as Watson rightly points out, its advantages are most undeniable. Investigate how different vendors are experimenting with “software-defined storage“, and whether their offerings fit your situation.

In plenty of datacenters, the most severe application performance bottlenecks and largest power consumers are the same spinning disk drives. Virtualized storage gives unprecedented opportunity, though, to wring more efficiency out of storage. It’s time to seize that opportunity.