SQL’s window functions are among my favorite techniques for boosting the readability and maintainability of database-backed applications. In many practical cases they also multiply performance of individual queries by a factor of ten or more.
Window functions often confuse intermediate SQL students because, I think, they often are presented in a fragmentary fashion. Melanie Caffrey, for instance, a senior development manager for Oracle and an expert programmer and author, gives the impression in a just-published article that they are not “standard SQL” or “pure SQL”. Like quite a few other authors, she writes as though they are an Oracle-specific “extension”, and, I think, scares off at least a few developers who take pains to write portable code.
Although none of this is wrong, there’s a more comprehensive and constructive way to think about window functions. Window functions–which Caffrey refers to as “analytic”, and which SQL slingers sometimes colloquially refer to as “
OVER“-s, because they hinge on the new
OVER keyword–appeared as part of the SQL:2003 standard. While that’s a decade old, senior programmers occasionally show themselves stuck on the origins of window functions as part of the ANSI OLAP extension.
Windows functions remain an “extension” for other commentators because MySQL and SQLite don’t offer them. On the other hand, they are prominent for Sybase, Postgres, DB2, and other relational database management systems (RDBMSs).
A final reason for the relative neglect of window functions is that they are more convenience than necessity. Josh Davey, for instance, nicely illustrated last year that SQL3 (an older standard, also known as SQL:1999) code such as
SELECT DISTINCT ON(category_id) cat_pictures.* FROM cat_pictures WHERE ((category_id, price) IN ( SELECT category_id, min(price) FROM cat_pictures GROUP BY category_id )) ORDER BY category_id ASC, cat_pictures.name ASC
achieves a typical effect of a window function. This is generally true: any SQL query that relies on windows functions can, with enough patience and subqueries, generally be re-written not to rely on a window.
Davey explains the advantage of window functions well. In his example,
SELECT id, category_id, name, price FROM cat_pictures WHERE (id, 1) IN ( SELECT id, row_number() OVER (PARTITION BY category_id ORDER BY price, name) FROM cat_pictures );
Davey rightly summarizes: “Groovy.” For this particular case, the latter, window-based expression is “consistently twice as fast” in his tests. More than that, I think the window-based alternative is easier to read, understand, and maintain, even for SQL programmers without previous exposure to analytics.
Caffrey’s summary is more formal, but equally true: “Using analytic functions is a powerful way to get answers about your data that would otherwise require convoluted, possibly poorly performing SQL.” Her deeper introduction to the topic touches on such topics as the distinction between sliding and anchored windows, and the detailed syntax of
Finally, understand that windows functions are not just widely-available across a range of RDBMSs, and provide multiple advantages; recognize also how practical they are, in that they apply to common situations. When I’m doing budget work, for example, I report on different projects in rank order, and display their cumulative budget impact. This makes it easy to see “where the yellow line is”, that is, which projects are ranked high enough to be funded, and which are below the threshold at which the budget constraint has been exhausted. Any retrieval which involves ranking and arithmetic–in this case, a sum over budget commitments–is a natural candidate for expression in terms of an SQL window function.