Bring Data and Applications Practices Together for Best Performance

Science vs. humanities. Mac vs. PC. East vs. West. Jay-Z vs. LL. Divides between “Two Cultures” abound. One of them particularly matters to performance management, though, because it has such a direct impact and because, with effort, it can be at least partially bridged: data vs. application.

Pop quiz: when you ask someone on your DevOps team to diagram, say, your e-mail system, or your customer relations management (CRM) infrastructure, does she begin with the application–server and client processes–or the data–e-mail storage, customer records, and so on? Do you see the distinction? Do you know how deep it runs, and that wrong choices in navigating it can lead to performance degradation by multiple orders of magnitude?

Agile consultant Scott Ambler has already written extensively on this “cultural impedance mismatch“. He gives good advice on the hazards and solutions. I supplement his recommendations with a few fine-grained observations bearing on performance that I’ve experienced repeatedly:

  • data and app people don’t agree on where objects live;
  • app and data people have different ideas about the natural way to express computation; and
  • security is a perfect model for the differences in perspective between data and application.

How home looks

Both data analysts and programmers talk about where objects “live”; they tend to mean entirely different things with those words, though. A programmer who needs, say, a short string for a piece of text might embed it in a *.conf or *.dat file, and only feels comfortable when that file image is committed to GitHub or a similar repository.

This makes no sense to a data person, who knows that the text belongs in a database, where it can be replicated, versioned, typed, and so on.

Conversations between the two camps sound like Shakespearean comedies, dense with mistaken identities that are perfectly clear to the audience. It’s no laughing matter, though, when these confusions lead to updates being mis-applied, or load balancers becoming, in effect, load im-balancers.

The solution is straightforward: there needs to be agreement among all parties about where the “home” for each individual object is, and how the object manifests in test, development, staging, and production instances. This does not mean that one of application and data rules the other; each side might well take responsibility for an appropriate collection of objects. Problems start when any one object has either zero or two guardians.

What language is this?

Confusion compounds when elements are computationally “active”. Programmers assume that the only fit home for a calculation about, say, the price schedule to apply for a particular customer, belongs in a function definition written in a particular programming language.

Database practitioners prefer to keep such a computation in a stored procedure or user function within a database instance.

The difference between these is more than just a territorial display. It’s easy for such a choice to determine a hundred-fold speed-up or slow-down. Worse, plenty of experienced IT (information technology) staffers don’t realize what big consequences an apparently simple migration from stored procedure to PHP, or Java to user-defined function (UDF), can have.

Security exhibits a similar split: application programmers tend to assume that security inheres in an application and its processes. In this model, the application “decides” what it needs. For data professionals, in contrast, security naturally is defined in terms of privileges associated with individual data elements, and applications need to demonstrate they have the right to access those data.

As subtle and small as these differences might sound, their practical consequences can swamp any other efforts of a team focused on performance or security. The good news is that the solution is straightforward: someone with sufficient authority and understanding needs to bring the application and data teams together to agree on who’s in charge of what. This agreement also needs to cover the workflows that bring applications and data together into useful systems.

However simple that sounds, organizations so often fumble this co-operation that, as one recent white paper on Database Application Lifecycle puts it, “… more than half of all application failures and downtime are caused by software change, configuration, release integration, and hand-off issues.” You don’t need to buy a product or adopt a comprehensive methodology to solve most of this; instead, start by having your data and application specialists talk with each other.