Slow SQL? My reactions to "MySQL Benchmarking Best Practices"

“The Gold Standard: MySQL Benchmarking Best Practices” gives good advice. It’s also incomplete, of course.

Measurement is always valuable, in my experience. I can hypothesize a circumstance in which it only amplifies confusion, or leads to a conclusion that was obvious before-hand. That’s never happened to me, however; careful measurement that I have observed, without exception, yielded interesting results. I only salute zhirayr for promotion of careful quantification of database access.

There’s little in the article that’s specific to MySQL, or SQL in general. Halfway through, the article introduces SysBench, which is indeed focused on MySQL. Other benchmarking jigs for other database management systems (DBMSs) exist, though, and nearly all of the ideas in the article apply not just to relational DBMSs (RDBMSs) like MySQL, Oracle, and DB2, but also to NoSQL, object-oriented DBMSs (OODBMSs), in-memory DBMSs, and more.

Big world

zhirayr concludes, “… MySQL performance benchmarking varies for … every organisation.” This can’t be over-emphasized. Data management is a vast subject, one whose multi-dimensionality is a challenge to capture in English. It’s not just zhirayr’s “varying demands, time frames and goals”, but a myriad of technical details dramatically impact database performance.

I particularly underline this point because sometimes well-meaning newcomers to data management think about benchmarking in terms of standard specific queries. Useful as these are as a starting point, they’re only a starting point. Actual experience for your situation might depend on such specifics of configuration that zhirayr mentions as caching and networking infrastructure. Others that don’t appear in the article are version sensitivity (MySQL and many other RDBMSs dramatically improve specific query optimizations from one release to the next), security considerations (certain access settings degrade affect performance by a large factor), virtualization complications (is the benchmarked instance comparable to the production one in its end-to-end latencies?), replication requirements or configurations, clustering, and even such apparently innocuous elements as locale (extreme examples of the latter can degrade common queries by a factor of a hundred!). Performance is notoriously non-linear in load: small increases in the demands made of a DBMS in certain ranges result in disproportionate degradations of response time and slow SQL queries. Also, MySQL and a few other DBMSs abound in “modes”: MySQL can be configured, for example, to validate and type-check arriving updates according to several different standard templates. Rigorous benchmarks take all these variables into consideration.

At a deeper level, patterns of usage vary greatly between applications and organizations. In the broadest terms, typical MySQL usage spotlights fast retrieval by a single reading pool of relatively broad queries. Oracle or PostgreSQL, in contrast, often host a lot of “business logic” in user-defined functions and other DBMS-side processing, and have multiple applications updating content. These are general-purpose tools, though, and any DBMS can support a wide range of data and applications models. A benchmark which doesn’t account for update load tells us little about retrieval performance in its presence–and vice-versa. Explicit description of every aspect and assumption of your benchmarking pays off.

Conclusion

I have minor reservations about zhirayr’s article: I don’t understand, for example, what “The input data should be checked multiple times” means. Other pieces he has written recently, such as “Identifying MySQL Performance Issues“, help fill in a few of these gaps.

My own summary:

  • It always pays to measure DBMS performance;
  • The parts of benchmarking that are specific to MySQL are small. Most of benchmarking applies no matter what DBMS(s) you use;
  • The benchmarking youdo is far, far more valuable than any other kind, because DBMS usage varies over such an extreme range. No one else has your combination of usage patterns and performance requirements; and
  • Differences in the deepest levels of information architecture can reverse performance comparisons at the level of technical configuration.

With these tips, the burdens of slow SQL performance can be minimized.