The Next Step: What IT Needs to Overcome the Challenges of Managing Virtual Environments By Doug MacEachern published: Thursday, September 20 2007
By Doug MacEachern, Chief Technology Officer - Hyperic
Virtualization has come of age, and is rising to a place of prominence in a growing majority of production environments. According to a research report by the Enterprise Management Association “Virtualization: Exposing the Intangible Enterprise” (July 2006), 75% of all enterprises have deployed virtualization in one form or another and 95% of them are planning to deploy virtualization.
This new era brings with it better hardware utilization, fluid computing and rapid deployment for a generation of flexible web implementations. It also brings unprecedented management challenges to the operators who manage the business-critical services that virtualization powers—challenges that most are ill-equipped to handle.
As deployment environments swell and become more dynamic, how do they know where to look to monitor system performance? As composite applications gobble up dozens to hundreds of services across increasing numbers of subsystems, how do they assess application availability? If a single machine runs several operating systems, how does one perform basic diagnostics?
The old rules go out the window.
For companies to successfully cross the “Virtual Divide,” there must be a paradigm shift in how organizations approach managing these systems.
Virtualization software carves out a logical, or virtual, machine from a physical one. This creates an artificially divided relationship between the applications inside the Virtual Machine (VM) and the physical resources they consume. This lack of visibility across the divide increases the propensity for configurations to change as administrators are forced to move VMs to improve performance rather than manage the performance of the VMs directly in context with the applications running inside them.
As a result, an IT administrator can wake up one morning to a “virtual sprawl” where their infrastructure has shrunk from 100 physical boxes to 25, but grown to 250 new VMs —with a greater combination of operating systems, applications and changing, complex infrastructure that needs to be managed.
Under the traditional systems management model, the task of monitoring virtualization performance and efficiency starts and stops with the virtual server software itself. Operators typically judge success by measuring what the virtual server says about how much of its physical resources are being consumed and distributed by the VM.
This approach, used by most traditional systems management vendors today, has two key limitations. First, since it only focuses virtual server-level metrics it is unable to show both the impact of the VM on the physical hardware and the overall performance of the hardware. Second, this approach ignores the overall performance and stability of the virtualized guest and applications.
Imagine building a new CRM application using an Apache web server on the front end of a Tomcat server all running on a Linux platform with a MySQL database collecting the data. This application is currently running on an off-the-shelf, dual-CPU server that occupies a slot in a rack deep in a datacenter. It doesn’t consume the full resources of the server, and is left to do its business alone on that box until one day the data center operators realize this has happened 20, maybe 50 times. They decide to virtualize these applications and make better use of their hardware investment.
In the next generation of this data center, that application now lives in a VM alongside four other VMs each running different applications with different technologies, including a J2EE app, a .NET app and mail server—each of which has its own appetite for resources.
Then Christmas comes, and the CRM system is flooded with new customers. Suddenly, the CRM system seems to have exceeded its capacity for assigned resources. Yet, the VMware internal monitoring tools suggest the VM has adequate capacity. The inevitable question arises: Is virtualization the cause?
Unfortunately, this question isn’t easy to answer.
On the lower, system-based level, it’s difficult to decipher how the VM monitoring tools choose to allocate resources to this now very busy VM. The original assessment of whether the CRM app fit within this particular physical host was based on a completely different load profile.
On a higher, application-centric level, the team lacks visibility into the performance of the different pieces of the supporting stack to determine if the application itself is the problem. This hinders their ability to anticipate performance issues before users start complaining, regardless of whether the app was running in a VM or on a dedicated host.
Given the pressing business need and lack of definitive visibility, the only solution to this problem is to dynamically re-provision the VM to another, less taxed virtualized host machine. This creates the modern day system administrators game of “whack-a-mole”--where lack of end-to-end visibility, the dynamic nature of web application load, and the sheer number of moving parts in an environment force applications to be moved from one side of virtualized infrastructure to another without solving the underlying problems.
The above scenario undermines the purpose of virtualization in the data center.
Virtualization makes the challenges of problem management far more complex than before, and that’s why a robust and sophisticated systems management capability is critical in virtualized environments.
Clearly a new approach is needed, one that gives data centers the ability to consolidate complete discovery, monitoring, analysis and control of all application, system and network assets, both inside and outside of the VMs powered by VMware.
It’s still the early days in the market, and a holistic solution to the problem of managing virtual environments is a ways off. Today there are a slew of new virtualization management vendors popping-up, and the traditional Big 4 are beginning to get the gears moving behind virtualization management solutions of their own. The problem is that each of these new solutions offers insight into only part of the picture—leaving the administrator to patch the results together to try and figure out where the problem is.
What should IT look for in the interim?
The best solution is one that provides a single window into both the virtual and physical environments, and helps the sys admin make sense of the two. To do this, IT should look for a solution that starts with an inventory model that represents a server, and all its applications and components within an intuitive set of relationships that apply to any combination of physical or virtual hardware, software, and services - regardless of the technologies involved.
Ideally, the system should automatically discover new software and services as they come online – either on the physical server or within the VM itself. It also should allow the administrator to group resources into logical sets of similar services and applications, and keep up with their movements.
Essentially, the management solution needs to be as fluid as the applications it’s monitoring. This means that the IT department managing these environments must have a single pane of glass where they can aggregate side-by-side comparisons, drill into detailed diagnostics and simultaneously apply alert policies to ensure they can meet the demanding service-level agreements for the business.
The CRM application example above illustrates how virtualization allows for rapid provisioning, deployment, and relocation of complete application environments. IT needs a monitoring system that moves just as quickly to detect every layer of a virtualized environment and react accordingly when any of those resources are relocated.
For example, it would have helped the team enormously, if in that scenario, their system was able to automatically discover the Virtual Server host along with the Linux guest with the Apache server and the MySQL installation and its corresponding tables and indexes. The next logical step is to map these resources with “parent-child relationships” to their counterparts on the physical Virtual Server. This mapping provides operations with a complete view of the CRM application, as well as how it relates to the physical host on which it lives.
Once the virtualized environment is properly mapped, the next challenge is to provide monitoring visibility into every layer. IT teams should look for solutions that provide out-of-the-box support for collecting real-time and historical values of performance and health statistics at every layer of the infrastructure—including the virtualization software itself.
In order to properly assess the performance of the CRM application it’s important to see all of this data in concert with each other. Before, it was known that the CPU utilization had maxed out, and it was moved to another location with more CPU without any information or analysis on what caused the change. It can only be assumed that the increased transaction load increased the appetite for CPU resources. The only fix suppressed the problem, but it didn’t uncover or address the root cause, and certainly didn’t optimize the virtualized environment.
The administrators in the CRM example needed a solution that would have allowed them to see the trend of the CPU utilization on the CRM VM as it moved toward the maximum threshold. If they had known this information in real-time, it would have been clear that the Tomcat server consumed all the free memory on the OS and stopped responding. In this case, the administrator could correlate the memory usage from the OS with the memory usage from the Tomcat server and the Tomcat webapp services. It’s evident that there is one particular webapp service that corresponds to the number of requests served by the webapp. This classic example of a memory leak was always there, and was exacerbated by the increased usage.
Armed with this information, action can be taken. First, the webapp developers can be alerted to a probable memory leak in their application, and are directed to a specific service to fix. Next, instead of moving the VM to a larger location, the Tomcat server can be restarted to clear out the memory.
Admins need new choices that will not only give them the right information, but give it to them well ahead of an outage so they can actually solve the problem.
The solution in the CRM example is a good one because the long term and unexpected outages can be limited and the VM can likely continue in its current location once the fix has been applied. The business continues normal services, and the IT team has taken another step forward in optimizing its virtualized environment.
These growing pains are typical for new technologies. Systems management always seems to be the second cousin to any IT deployment. As virtualization usage is accelerating into the front office, systems management vendors are starting to rise to the occasion. They don’t have a choice – either they embrace it, or become irrelevant for the entire next wave of IT innovation. Without it, IT departments endeavoring to virtualize large portions of their businesses will most certainly fail.
Doug MacEachern is co-founder and chief technology officer of Hyperic. Prior to co-founding Hyperic, MacEachern was a senior software engineer at Covalent where, with Hyperic CEO Javier Soltero, he shaped the development of the product that would eventually become Hyperic HQ. He is a recognized leader within the open source community, having designed, implemented and maintained both generations of the mod_perl project and has contributed to other projects including Apache httpd/apr, Perl and PHP. MacEachern has also given several talks at open source conferences and co-authored the book "Writing Apache Modules in Perl and C" published by O'Reilly in 1999.
He has more than 10 years of open source and commercial development experience, and in addition to Covalent, has held senior software engineering positions at Backflip.com, Critical Path and the Open Software Foundation.
MacEachern received his BA in Communications from the University of Maine.
|