Choice of technologies for leak monitoring in Java applications

Leaks are not inevitable. Charley Rich, vice president with Nastel Technologies, is right when he counsels that “application leaks must always either be prevented or fixed as quickly as possible.” He illustrates that memory leaks are serious problems even for Java applications: garbage collection in the Java virtual machine (JVM) leaves at least half-a-dozen points with leak potential, contrary to naive opinion that Java doesn’t have memory problems. Their detection and correction are important complements to application performance management (APM).

Rich is insufficiently aggressive for my taste, though. The only technical solution he offers is heap-utilization monitoring (HUM). HUM is valuable, and natural to enable as a component or complement to APM, especially from the operational side of a datacenter. Many developers, though, need to learn about how they can use static source analyzers and run-time diagnostic libraries to eliminate leaks.

I emphasize that word “eliminate” because I too often see HUM leading to incomplete or even misleading diagnoses. In my observation, practical use of HUM frequently stalls with a conclusion such as, “we know the problem has something to do with the load balancer, but we haven’t been able to isolate the exact sequence.”

When it works correctly, a static analysis tool gives precise line numbers in original Java class definitions. That “works correctly” part presents its own challenges. While I’ve used a number of static analysis tools to good–even powerful–effect, I almost have met multiple experienced programmers who only report frustration and wasted motion with them. Use of these tools seems to be a skill distinct from “programming”, so it’s important to test the compatibility of your development team with any tool you consider. Different static-analysis tools seem to fit different mentalities or work styles.

Much the same is true of run-time memory profilers. I generally see run-time memory profilers more easily adopted by developers, and they identify most of the same errors that static tools turn up. HUM is one particular variety of run-time memory profiling; there are many others, from fee-free jstat, jconsole, and VisualVM to commercial offerings such as YourKit.

Part of the reason for the proliferation of tools is that they truly have different strengths and weaknesses. Any two tools will quickly diagnose 80% or more of real-world memory faults in common; there’s no simple formula for figuring out which edge cases will be found by which tool, though.

For all these reasons, I strongly recommend that every DevOps team at least experiment with memory testing tools. Find one that matches your technology set well, and run a serious pilot test. Unless your applications are very unusual, you’re likely to turn up at least a few memory defects that are worth fixing.

If so–if you have evidence that memory management deserves to be an on-going focus of your operations–a separate decision is how you’ll address the problem in a sustainable way. While I generally like to use one static and one run-time memory tool in any organization, the most important part of any such plan is to identify a tool or tools that work well in your environment. Start there.

Correction of memory errors is rewarding, because their presence so often leads to noise and mystery. An application with even a slow memory leak might perform great just after a restart, but deteriorates to the point of unusability over the course of a week or a month. Fix the errors permanently, and you’ll be in a considerably better position to evaluate the application’s true performance.