Virtual Infrastructure Optimization: Essential For Virtualizing Business-Critical Applications
Virtual Infrastructure Optimization: Essential For Virtualizing Business-Critical Applications
By Len Rosenthal
published: Tuesday, December 08 2009


Virtual Infrastructure Optimization:  Essential For Virtualizing Business-Critical Applications

 

As virtualization continues to be deployed with business-critical applications, IT organizations are requiring production-grade performance management and infrastructure optimization solutions to cost-effectively maintain service levels. A number of leading industry analysts have identified a new category of solutions, called Virtual Infrastructure Optimization (VIO), now emerging to address these requirements.

 

VIO solutions extend and enhance device management and capacity planning, efforts that are severely complicated by server and storage virtualization. The ability to quickly move virtual machines, and the dynamic nature of the connections they make to physical and virtual resources, commonly cause new levels of contention and significant capacity problems. These are not properly addressed by existing products designed for more static environments. Specifically, storage and I/O performance issues are often the biggest challenge to successful server virtualization deployments. New configuration, conflict, and contention issues arise that expose the lack of complete visibility across the server, network and storage domains. Virtual Infrastructure Optimization solutions solve this problem by offering cross-domain visibility with a focus on the performance of the entire virtual infrastructure, including deep insight into the storage area network – the biggest potential performance bottleneck. 

 

Virtualization Meets Business-Critical Applications

Over the last decade, server virtualization has primarily been used in the development, test and lower-end file serving applications.  Today, virtualization solutions, such as VMware, are now considered production-grade. As IT organizations deploy virtual servers into production,  it is imperative to recognize that design and deployment strategies for business-critical applications are not the same as they were in the dev/test environment.

 

Server consolidation initiatives for development and test generally provided users with similar or slightly improved performance, mainly due to the fact that larger, more powerful physical servers were deployed, shared storage was implemented, and dev/test virtual machines are frequently idle. Sharing actually led to better service because the shared resources were typically over-provisioned. Server consolidation ratios (enabled primarily by more efficient use of available CPU cycles) were acceptable and generated enough capital expense reductions to justify some excess capacity in the storage or memory realms. As a result, server virtualization optimization efforts to date have mainly been focused on capacity savings. Performance gains as experienced by users are an artifact, as opposed to having been architected. Contention, when it occurs, is typically resolved manually by moving a virtual machine or resizing it after users complain. 

 

Capacity-focused planning and reactive contention management in a virtualized production environment, can not only mask serious underlying architectural problems, but can also quickly become operationally devastating. Early-mover enterprises that have deployed business-critical applications on virtualized infrastructures have exposed many unforeseen and complex contention problems. As virtualization becomes mainstream for business-critical applications , it requires a new class of performance planning and utilization optimization tools.

 

You Can’t Optimize What you Can’t Measure

In IT operations, it is well-known that if you can’t monitor something, you can’t manage it.  The corollary to this is, you can’t optimize what you can’t measure. Traditional IT operations have focused on element or device management: improving visibility of the behavior of servers, switches, storage arrays, network devices, etc. Virtualization changes the game, these elements are now mobile, dynamically reconfigured, and connected to each other in real-time. The run-time interaction of elements becomes more important than their individual operating profiles. In other words, the complete virtual infrastructure becomes the most critical element to monitor, measure, and optimize in real-time.

 

The proof behind this is the reports that very few IT organizations (~25% according to the Taneja Group) are actively using VMWare’s vMotion.  The risk of utilizing vMotion with no visibility into the I/O and SAN infrastructure is simply too great. Performance problems can’t be easily identified or resolved. This lack of virtual infrastructure visibility lessens the benefits of virtualization by leading to lower consolidation ratios and an increase in the overall expenses associated with virtualized applications.

 

The proven benefits of virtualization can themselves contribute to this lack of visibility. Virtual machine (VM) dynamic resource scheduling (such as VMware DRS) generally relies on coarse-grained thresholds and can lead to serious contention. Consolidation means that an overloaded host will affect performance of every VM running on it, often catastrophically. Dynamic VM resizing for optimal CPU and memory sharing becomes geometrically more complex with increasing consolidation ratios. And, each added layer of virtualization (server, I/O, storage) introduces new potential for bottlenecks.

 

Of particular importance is the lack of end-to-end visibility when server virtualization is coupled with storage virtualization. Over 70% of VMware deployments drive the implementation of storage virtualization (most often with a Fibre Channel SAN), and performance at the storage tier becomes integral to and inseparable from overall application performance. SAN I/O latencies are greater than server CPU and memory latencies by a factor of 10 to 100, and are therefore much more likely to impact overall application response time. Also, in many enterprises, storage is managed by a distinct operations team with unique skills and special tools. As a result, there might be adequate element visibility in each of the server and storage domains, but the data is not integrated or correlated, nor is there a clear picture of who “owns” a performance issue. Add to this a complex, multi-vendor storage infrastructure and virtual machine architectures that virtualize the I/O path, and it’s clear that storage and I/O optimization is essential.

 

This lack of ‘server to spindle’ visibility and correlation is more than a theoretical challenge: our experience shows that many large enterprises on the forefront of production-level virtualization document issues which, if left unaddressed, threaten to destroy the cost savings of server consolidation. Without a cross-domain ability to identify and quickly resolve performance problems, IT departments purchase additional storage or switch capacity to buy time until the next failure. Without an independent source of correlated performance data, hours are wasted pointing fingers between server and storage teams. Without real-time infrastructure data, operations are reactive, labor-intensive, and performed under duress. And without a clear picture of the root causes of performance problems, administrators are unlikely to flex the infrastructure or automate management operations.