Grey Matter Still Required: Automated Diagnosis is a Myth

My dad spent his career as a tool and die maker; working on machine and metal cutting tools to support manufacturing and pre-production processes. While I never acquired these skills, I appreciated the benefits of this experience—when we broke something, we spent the time understanding the fix and typically went down to the shop and made a replacement part. As I grew older and began to find my way in the world, it occurred to me something had changed. We no longer fixed things; remediation evolved into a simple swapping of component, without the diagnostics and troubleshooting I had observed as a youth.

No-Easy-ButtonIt was also clear we were losing something more than amazing vocations and trades. We were losing the rigor and understanding necessary to build knowledge—or the grey matter—important to rise above commodity or “good enough.” We seem to be looking for the easy button; something that will do our work for us, without the need to invest the time and build the expertise first hand.

Pretty heavy stuff. And just like the next guy, I don’t want to have to reinvent the wheel. But there is a base level of knowledge and experience that is still necessary, and unfortunately I find far too many looking for a tool that will get them off the hook.

In my job I regularly try to spend time speaking with IT leaders and learning about their challenges and approaches to delivering the next-generation desktop workspace. I often have an opportunity to learn about the challenges these leaders face when scaling virtual desktop infrastructure, building an application delivery platform, or simply understanding the interplay between servers, storage and networking. And more often than I care to admit, these leaders tell me they need automatic diagnosis and recommendations from our solution—they want a product that will do the work for them, find the constraints and tell them specifically how to fix their woes.

PEBKAC, IOPs and CPU Queuing

I often have to bite my tongue during these interactions. No one likes to be told they are in over their head, or that their problem exists between the keyboard and chair (PEBKAC). I appreciate why there is a lack of understanding for user-centric workloads in the data center. It is after all, a relatively new approach to delivering desktops. I also appreciate that delivering VDI, layered applications and multi-session architectures such as terminal services and RDS can be challenging at scale. And while there is much that can be learned from others in industry, these efforts do not replace the need to understand and build expertise on these platforms, infrastructures and architectures. Most important, it is not prudent to avoid the effort altogether and think you can buy a tool that will do it for you.

The desktop world did not require an understanding of input and output operations per second (IOPs). CPU queue length, processor utilization and percent ready had never came up as a means to optimally size a user for a specific desktop pool. In the physical PC, page faults could be solved with a quick visit to Amazon for a $99 HDD to SSD upgrade. Move that workload into the data center and the level of expertise and understanding behind these metrics take on a whole new level of importance. There are multiple hundreds of key metrics and thousands of interactions among them that can affect user experience or bring down the entire system. This is not to say you should avoid these new desktop workspace delivery approaches; only that you need to make investments in building your experience and adopting solutions that provide you with the necessary visibility.

Don’t Forget the Grey Matter—Experience Matters

When showing Stratusphere UX to potential customers, I’m often asked if it can provide recommendations or that easy button for the remediation of performance and user experience issues. Often folks don’t fully understand the metrics we trend or how they can be used to address resource constrains and ensure you are delivering the optimal user experience. And as much as the industry as a whole is getting better, this is not an easy thing to deliver.

Grey-Matter-is-Still-ImportantAnd for those of you in the know, please take a look at the solutions that claim to offer recommendations and automated diagnosis in the specific platforms and architectures that support next-generation desktop workspaces. You’ll see metrics around increased host RAM utilization with suggestions such as “decrease the user density, or increase the quantity of host RAM.” Um, really. Thank you Captain Obvious. Similarly, you might find an elevated IOPs metric that recommends you “increase your storage performance.” These tools are simply offering a view of symptoms, but do not remove your responsibility to dig deeper and understand the interplay of the platform, the user and the true root cause.

By way of example, IOPs spikes are quite common in the delivery of virtual desktop infrastructure—Gartner and other industry pundits often name storage as the number one impediment to the success of next-generation workspaces. Similarly, poorly implemented user tiers and the incorrect sizing of a desktop pool will lead to memory paging—another common ailment in the delivery of virtual desktop workloads. Remember the storage array noted above is the VM hard drive; often taking the blame when paging to virtual disk creates a spike in IOPs. The initial remediation effort might be that you’ve either incorrectly provisioned the guest desktop VM or your storage is underperforming. Right? The easy remediation answer: go purchase a new tray of SSDs, and you’ll be right as rain.

True Visibility Provides True Root Cause

Without the understanding and visibility provided by Stratusphere UX, you will miss the true root cause—something that could never be recommended by an automated diagnostics tool. In the above example, a deeper dive showed that the desktop in question had an application that often called for 12-15GB of RAM. In this case the application, Google Chrome, was being used within a 4GB user image; needless to say, this user was quite unhappy with VDI. Can you blame him… his desktop session was constantly paging virtual RAM to physical disk (remember the storage array and the IOPs spike).

With a few more clicks in Stratusphere UX and a brief conversation with the user, it was noted he would often have 20 or more browser tabs open. Simply switching between and reloading tabs was causing significant user experience issues that were gaining the attention of upper management (not in a good way). The customer initially believed it needed to setup and provision a 15GB VM for the user—or worse, spend a few thousand dollars on a new tray of SSDs. However, the fix for this problem was far easier: we simply provided a little user-training on how best to use a web browser.

The metrics and visibility provided by Stratusphere UX is pretty comprehensive. The details we help unearth in the process of monitoring and diagnosing user experience issues go a long way to help you make better decisions, which leads to higher levels of success with next-generation desktop workspace projects. We can’t fully automate diagnostics for you (I’m sorry), but we can provide the visibility to make you more successful.

One thought on “Grey Matter Still Required: Automated Diagnosis is a Myth

  1. Pingback: Why are Application Assessments Important? | Liquidware Labs Blogs

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.