Monitoring vs. Diagnostics: The Balance of Proactive and Reactive

Ray Swanson

10 years ago

I have the great pleasure of being able to interact frequently with IT professionals. I get to learn about what keeps them up at night, how they balance projects, stay ahead of the curve and meet the needs of the organizations they support. What I find most interesting—regardless of the size of the organization or the number of admins and engineers on staff—is there is often a lack of balance between the proactive and reactive. In this post, I’ll limit this statement to the balance between monitoring and diagnostics. Most important, I’ll examine these two critical elements as they relate to IT visibility and how organizations need to consume the data and information to successfully navigate each.

Monitoring is often categorized as a luxury. It’s a job function that may have a dedicated staff, or perhaps is something given lip service, but not the time, energy and focus it deserves. At its core, monitoring is about being proactive. It’s about observing and checking quality, and includes importantÂ supporting elements such as alerting and reporting.

Diagnostics on the other hand, is a reactive task; something performed when you are in jeopardy. It is a methodical process—a troubleshooting exercise—that involves the identification of symptoms and ultimately a determination of root cause. Diagnostics is something everyone must be prepared for, but few invest the time up front—in fact, only those who have suffered (or are currently suffering) truly understand the need to mitigate the risk and put themselves in the best position to react accordingly.

What Does This Have to Do With End User Workloads?

When we look at the different approaches that provide for next generation desktop workloads, we are talking about technologies and platforms such as RDSH, VDI, published applications, terminal or remote desktop services, etc. In short, we are referring to the movement of end user workloads from the employee’s desk to the data center. Of course this shift comes with profound operational and end-userÂ benefits. But the shift also comes with an increase in complexity and ideally, a more thoughtful and strategic approach to support and meet user experience expectations.

From a monitoring and diagnostics perspective, these elements and the balance between proactive and reactive tasks is a bit of a cause and effect relationship. That is, the more focus placed on proactively monitoring the environment, the less you are at risk; and the less reactive you will need to be for events that are out of your control. That is not to say you will entirely avoid the need to reactively diagnose in an effort to minimize downtime. In fact, anyone who has spent any time managing a next generation data center-based desktop approach will tell you it is not a matter of ‘if’ you will experience issues—it is a matter of ‘when’ will the wheels come off the bus!

On that very specific note; one of the greatest points of interest I find when speaking to IT professionals is how little attention is paid to diagnostics versus monitoring. Monitoring is relatively easy. The risk in your monitoring choice is very low (as compared to your diagnostics choice) and many of the user interface and features decisions are subjective. Further, monitoring is a bit easier to sell to leadership. It’s easy to bring a sexy dashboard to life on that big monitor in the network operations center. And tasks such as alerting and reporting will go a long way to getting the appropriate head nods with management. But how do you ‘sell’ diagnostics?Â How do you cost justify the opportunity to minimize downtime and keep users productive. It’s a lot like trying to cost justify insurance.

Don’t Wait for Downtime to React

The easiest way to justify a purchase is when you have no other choice…Â Wouldn’t it be awesome if you could purchase auto insurance after you’ve had an accident? I can share countless stories of organizations and top-notch engineers who were unable to dig themselves out of an end user issue. It’s unfortunate and very common to find professionals who have spent countless hours looking at server, storage and software architecture details gleaned from their existing monitoring platform, to only chip away at a major issue—or more commonly, never get to the point of identifying the root cause.

The ability to diagnose and leverage trends in a user-centric manner is relatively new. More importantly, the ability to diagnose is far more difficult than monitoring and fraught with more risk. And unless you’ve lived to tell the tale, it’s extremely challenging to bring this important point to life. This is where a solution like Stratusphere UX shines. As both a monitoring and diagnostics solution, Stratusphere provides the balance to support your proactive and reactive tasks—moreover, it does so with visibility to all users, all machines and all applications; all of the time.

The skills, tools, and ultimately the visibility required to proactively monitor versus reactively diagnose are very different. I have witnessed this first hand. I have seen organizations invest in a monitoring platform—placing a significant emphasis on features such as alerting, reporting and dashboards—only to be left out in the cold when they are in jeopardy. That’s not to say monitoring features are not important; only that more effort must be placed on planning to react. In my next post, I’ll bring this very important distinction to life with a tale from a recent diagnostics project we were called in to support.