High-level and low-level revision-control tips for the datacenter

Many datacenter teams don’t know the basics of revision control systems (RCS).

I say that to be helpful, not provocative, nor to insult anyone’s intelligence. Information technology (IT) changes rapidly, and we all have a lot to learn. While RCS (also known as version control, source-code control, release control, software configuration management, and so on) is an old and well-established toolset, it has rarely been taught academically. The gaps I’ve encountered even in well-run datacenters make me think it’s timely to mention a few tips.

RCS documentation is often addressed to programmers, rather than administrators or network managers. It has an important role for the latter, though. The basic idea of RCS in the datacenter is that provisioning and configuration are distinct, and best managed separately. Confusion of the two only makes things harder. Consider the common case of an isolated e-mail server: in principle, the healthy way to bring one on-line (especially if it’s “in the cloud”, or an outsourced service, or otherwise not a traditional identifiable box on a particular rack) is first to provision a standard server, including all hardware accommodations, patches, disabled services, and so on. Then, with the server in a known healthy state, e-mail software is installed and configured to a specific chosen role.

There are plenty of ways to realize this abstract sequence. At one extreme, the server configuration might be captured as a file image which can be played as a virtual machine appliance. At another, it’s common in the Unix/Linux world to update textual .conf and similar files on a physical server to specify the exact SMTP (simple mail transfer protocol) service desired.

In either case, the point is that, at a managerial level, conversations need to be about, “On date DDDD, administrator NNNN brought up reproducible configuration MMMM”, rather than, “we think we remember that NNNN got mail services working on DDDD.” The revision control toolchest is all about enabling crisp declarations like the former one.

While programmers relish arguing the details of competing RCSs, the differences between them are small from a DevOps perspective. If you’re fortunate enough to have VCS or, even better, application lifecycle management (ALM), already in-place as an organizational standard, use it. If not, figure out what works in your situation, and apply it to everything you do in the datacenter. The ideal is that nothing in production results from “manual” configuration; instead, everything is a standard, reproducible and controlled revision or version.

If you’re new to RCS concepts, read through Eric Sink’s Version Control by Example and Luke Kanies’ “using version control in system administration“. The former is aimed at programmers, and promotes distributed RCS (DRCS), which rarely applies to sysad situations. At the same time, Sink writes with exceptional clarity, and he’s eminently practical. Whatever particular technologies you use in your datacenter, keep these tips in mind:

  • Commit revisions frequently. Ideally, each revision should accomplish a single change, and of course each revision should include a comment which clearly explains that change. It truly is better to have two successive commits, “update formats to match new hostname scheme” and “update hostnames to new scheme”, than to combine the two. Commits are cheap. Use them freely.
  • Do not store derived files. Store a firewall’s textual deny-allow configuration, rather than the binary generated from that configuration. Among several other reasons, RCS works better with texts than binaries.
  • If you need to store binaries, though, don’t hesitate to do so. Prejudices against binaries in RCS linger decades after they made sense.
  • Tag anything with business meaning. Tags are also inexpensive, and they’re the right way to document milestones such as, “This is the initial configuration for our Toronto office.”
  • Do whatever it takes to integrate RCS with other ALM-class tools you use. Even if your tools don’t build in integration points, your comments should always relate to other facilities: “This completes the update which corrects the symptom reported in Trouble Ticket #TTTT” or “We create an experimental configuration CCCC for trials to be run with APM screen SSSS.SS”. It only takes a few seconds more to make these connections explicit, and the traceability they give from business need to update implementation is one of the most important habits DevOps can cultivate.
  • If you’re using anything other than a home-grown RCS, enable its Web interface.  You’ll be surprised how useful it is to be able to navigate through your repository inside a standard browser.