Cloud Performance Management Driven by Continuous Change

There are several trends affecting the way companies deal with monitoring infrastructure and applications for availability and performance:

    • Agile development – drives frequent application rollouts and code pushes into production, without proper tools to enable agile rollouts
    • Virtualization – demands new tools for managing OS resources such as CPU and memory
    • Private clouds – result in business services composed of many moving pieces (infrastructure, software and data) which are hard to manage from a performance perspective

The common denominator of these trends is change—an IT environment that is constantly changing. Applications are constantly modified, infrastructure is dynamic, and today’s activity will never look the same as yesterday’s.

This continuous change calls for a substantial shift in the way companies manage performance. For example, setting CPU thresholds and baselines that rely on information from the past in order to understand what’s “good” or “bad” in terms of the CPU utilization is no longer a valid way to measure and alert. If the things we measure are changing every day, baselines will turn out to be almost obsolete. Don’t get me wrong; I am not saying the OS resources should not be monitored. But the role these metrics play within the performance scheme and decision-making process simply needs to change.

In today’s dynamic environments, the right way to manage performance is from the end-user transaction perspective. Come to think of it, the end-user experience is the only real, valid and “bullet-proof” (in terms of false alarms) indication that something is wrong. For example, if end-user experience is good, it doesn’t matter that one of the elements that participate in processing the end-user request has a CPU utilization of 95 percent. It’s probably the way it should be. However, if end-user experience is not ok, there is an element that is causing it. Maybe it is that server that has 95 percent CPU utilization, and maybe it’s not. So how do we know?

Today’s innovative transaction management technologies such as Correlsense SharePath track each end-user transaction across its entire path and build models of behavior that reflect the impact of each element of your infrastructure on the end-user experience. With this approach, one can immediately gain visibility into changes of transaction behavior and know, for a fact, which component is impacting and degrading the end-user experience, regardless of the complexity of the application architecture and the environment, and regardless of how many tiers and elements are being invoked for each end-user request. Once you have isolated the element causing the disruption, you can look at its other types of performance metrics (such as CPU utilization) in order to determine, for example, if resources need to be reallocated.

This is the crowdsourcing principle serving application performance management. If you think about it, your data center gets so many end-user requests each day. These requests tell you if your application is performing well or not, if it starts to degrade, or if its behavior is sporadic. Using the right transaction management technology, you can then understand the behavior model of those requests – how your user’s experience is impacted by each element in your infrastructure and exactly what has changed that eventually makes things not work properly today, although they did work properly yesterday, last week, or even an hour ago. This behavior-change visibility is a key issue in understanding your infrastructure and application performance. With so many end-user requests, you just can’t go wrong…