Empire Management, Episode 2: Optimizing Virtualization Performance Print E-mail
By Anil Desai

published: Friday, March 21 2008

Whether your primary job function is more like that of Han Solo – avoiding Imperial pursuit forces – or that of Darth Vader (doing said pursuing), you know that performance is important.  Part of every IT manager’s mission is to squeeze as much potential performance out of existing investments as possible.  While your data center might resemble a massive Death Star, it’s important that it’s individual components run as smoothly as, say, a TIE Fighter.

In my previous article in this series, Empire Management 101, I focused on topics related to how you can monitor the performance of your virtualization host servers and the VMs that they support.  In this article, I’m going to focus on the application of this information – how you can use performance details to make better decisions about how to deploy and distribute your VMs.

Prioritizing Workloads

In the world of IT infrastructure, not all workloads are created alike.  Some applications and services are absolutely mission-critical.  Disruptions in performance or in service will cause the generation of an immediate not-so-kind communiqué from the affected user(s).  Other workloads – such as test and development computers or those that host seldom-used programs – are less important.  When deciding how to distribute your VMs, a good first step is to assign some kind of priority to them.  Figure 1 provides a high-level example.

 

20080321-1.gif

Figure 1: Prioritizing workloads based on importance and requirements


Categorizing Workload Resource Requirements

Priorities are important for determine workload placement, but the main balancing act you’ll need to keep in mind is that of managing system resources: namely CPU, memory, disk, and network systems.  Which mixing and matching VMs, it’s ideal to deploy systems with a “compatible” combination of requirements.  You should start by profiling your workload requirements, as shown in the Table 1.

 

20080321-2.gif
Table 1: Categorizing VMs based on resource requirements

In the table, I have simplified things by using a subjective classification method of high, medium, and low.  It’s better than nothing, but if you can replace the values with objective statistics (such as disk throughput and network bandwidth numbers), you can compare the details with the host server’s capacity.   You can then decide which workloads are compatible.  For example, several Development Test VMs can reside on the same server, as long as there’s significant network capacity and physical memory.  Of course, there might be other requirements.  For example, security will typically dictate that you shouldn’t place a Domain Controller VM on the same server as a public web server.

Also, keep in mind that some types of workloads – such as a very busy database server – might not be a good virtualization candidate at all.  In the following sections, I’ll assume that we’re considering only those workloads that are good options for running within virtual machines.

The Performance Optimization Process

Many organizations tend to take a reactive approach to performance optimization.  When users complain, it’s time to look at performance.  The problem is that at least some damage has already been done by the time you’ve heard about the issue.  Some of that can be solved by performance monitoring.  In other cases, it’s important to implement a proactive optimization process.  Figure 2 provides an overview of a standard performance optimization process.

20080321-3.gif

Figure 2: Steps in a performance optimization process

While the steps might not be rocket science (or even TIE Fighter science), it’s important to have a process.  Perhaps the most important portion is to make a single change at a time.  It’s tempting to flip a bunch of levels or push numerous buttons all at once to see if there’s an improvement.  Even if that works, though, you won’t really know what you did to make things better.  And, what if one change increases performance by 20% and another decreases it by 15%?  The net result (an apparent 5% improvement) might leave you feeling quite satisfied.  That should keep you comfortable as you await Darth Vader’s choking powers. 

Another important aspect of the process is that it could theoretically go on forever – you can always improve performance.  With each iteration through the loop, you’ll typically get diminishing returns.  At some point, it will be time to consider performance to be “good enough” and to move on to something more fun (might I recommend listening to Sy Snootles and the Max Rebo band in a local cantina?).

Performance Testing Approaches

Monitoring existing systems is all well and good, but what can you do to avoid potential performance problems for applications that are yet to be deployed?  A common requirement is to reduce the risk of performance issues when moving from an application running on a physical server to one running within a VM.  Or, you might be planning to deploy a new application within a VM with little to go on other than a Force-like intuition that it will work.  Regardless of your midi-chlorian count, you can perform some basic testing to help avoid problems.  Figure 3 provides an overview of several approaches to testing performance characteristics.

 

20080321-4.gif

Figure 3: Comparing performance testing approaches

One rather simple type of testing is through the use of synthetic benchmarks.  These hardware stress tests can be used to determine the absolute performance capabilities of your hardware and of the VMs that it supports.  For example, you might want to determine the maximum disk throughput you can achieve from using Direct-Attached Storage (DAS) devices on a host server.  You might find a sustained rate of 25MB/sec.  If you know the performance requirements of your VMs, you can then safely assume that the server can accommodate peak disk throughput up to that level.  The primary advantage of synthetic benchmarks is that the tests are easy to run (and repeat), and you can use on a wide variety of targets.  If you decide to move to an iSCSI storage environment, for example, you can just simply re-run your tests to measure throughput over the network .  (Note, however, that other considerations – such as latency and the size of the average IO operation might be bigger issues.)

Load testing is the next step up and can require a significant amount of effort to obtain results.  This approach involves simulating end-user activity on the application.  You might choose to test the performance of a web server when running directly on physical hardware and compare the results with the same workload running within a VM (automated P2V tools can really help simplify the setup process).  The drawback is that you must have a relevant test to use.  For simple web applications and some types of databases, generic benchmarking tools are available.  In other cases, you might need to roll your own or resort to manual load testing (the latter often requires lots of pizza and IT people with tremendous patience).

The most accurate method of predicting performance is to use historical information.  Assuming it’s available, you can determine past average resource utilization statistics to establish a baseline.  You can then look at peaks and determine the amount of resource capacity your servers will require.  Clearly, tracking performance can provide some huge pay-offs.

Automating Performance Management

If you’ve made it this far, I’ll bet that there’s one nagging question on your mind:  How do I find the time and resources to manage my entire environment?  As much as I depend upon them for operations, I can’t trust Storm troopers with these types of tasks.  Perhaps a fleet of droids would help?

If you’re sold on the value of performance monitoring and optimization, you can improve operations by investing in automated performance management tools.  Numerous vendors provide virtualization-aware suites that can help automate tracking and analysis of resource utilization statistics.  Figure 4 provides a list of some of the features you should look for.

 
20080321-5.gif

Figure 4: Automating performance management

This is a fairly lengthy list of features, but a few should be called out as particularly important.  First, measuring performance as users see it is an important consideration.  No one will care that overall CPU utilization on a database server is low when they’re having trouble generating reports.  Ideally, a performance testing product will be able to simulate real end-to-end user activity like selecting an item, placing an order and receiving a confirmation via a web-based storefront.

Dynamic resource reallocation can automatically take corrective actions whenever manual intervention isn’t required.  For example, an automated utility can automatically increase the physical memory allocation for a memory-starved VM that’s causing lots of paging to occur.   Better yet, VMs can be magically transported between host servers to rebalance VMs based on their actual usage patterns (vs. what you originally predicted).  The overall goal is to create a fluid environment that automatically adapts to changing requirements wherever possible.

Summary

Armed with the appropriate performance monitoring and optimization approaches, there’s a good chance that The Empire could have been better managed.  Would it have made a difference in the Clone Wars and in resisting The Rebellion?  Perhaps, but that was a long time ago, in a galaxy far, far away.  What about the datacenters of here and now?  Chances are, you initially used virtualization for server consolidation.  Through the use of performance optimization approaches, you can ensure that you maximize resource utilization while still meeting business requirements.  May the Force be with you (and your servers).


 

 

 
Next >