By Anil Desai
published: Friday, March 21 2008 Whether your primary job function is more like that of Han
Solo – avoiding Imperial pursuit forces – or that of Darth Vader (doing said
pursuing), you know that performance is important. Part of every IT manager’s mission is to
squeeze as much potential performance out of existing investments as
possible. While your data center might
resemble a massive Death Star, it’s important that it’s individual components
run as smoothly as, say, a TIE Fighter.
In my previous article in this series, Empire Management
101, I focused on topics related to how you can monitor the performance of your
virtualization host servers and the VMs that they support. In this article, I’m going to focus on the application
of this information – how you can use performance details to make better
decisions about how to deploy and distribute your VMs.
Prioritizing Workloads
In the world of IT infrastructure, not all workloads are
created alike. Some applications and services
are absolutely mission-critical.
Disruptions in performance or in service will cause the generation of an
immediate not-so-kind communiqué from the affected user(s). Other workloads – such as test and
development computers or those that host seldom-used programs – are less important. When deciding how to distribute your VMs, a
good first step is to assign some kind of priority to them. Figure 1 provides a high-level example.
Figure 1:
Prioritizing workloads based on importance and requirements
Categorizing Workload Resource Requirements
Priorities are important for determine workload placement,
but the main balancing act you’ll need to keep in mind is that of managing
system resources: namely CPU, memory, disk, and network systems. Which mixing and matching VMs, it’s ideal to
deploy systems with a “compatible” combination of requirements. You should start by profiling your workload
requirements, as shown in the Table 1.

Table 1: Categorizing
VMs based on resource requirements
In the table, I have simplified things by using a subjective
classification method of high, medium, and low.
It’s better than nothing, but if you can replace the values with
objective statistics (such as disk throughput and network bandwidth numbers),
you can compare the details with the host server’s capacity. You can then decide which workloads are
compatible. For example, several
Development Test VMs can reside on the same server, as long as there’s
significant network capacity and physical memory. Of course, there might be other
requirements. For example, security will
typically dictate that you shouldn’t place a Domain Controller VM on the same
server as a public web server.
Also, keep in mind that some types of workloads – such as a
very busy database server – might not be a good virtualization candidate at
all. In the following sections, I’ll
assume that we’re considering only those workloads that are good options for
running within virtual machines.
The Performance Optimization Process
Many organizations
tend to take a reactive approach to performance optimization. When users complain, it’s time to look at
performance. The problem is that at
least some damage has already been done by the time you’ve heard about the
issue. Some of that can be solved by
performance monitoring. In other cases,
it’s important to implement a proactive optimization process. Figure 2 provides an overview of a standard
performance optimization process.

Figure 2: Steps in a
performance optimization process
While the steps might not be rocket science (or even TIE
Fighter science), it’s important to have
a process. Perhaps the most important
portion is to make a single change at a time.
It’s tempting to flip a bunch of levels or push numerous buttons all at
once to see if there’s an improvement.
Even if that works, though, you won’t really know what you did to make
things better. And, what if one change
increases performance by 20% and another decreases it by 15%? The net result (an apparent 5% improvement)
might leave you feeling quite satisfied.
That should keep you comfortable as you await Darth Vader’s choking powers.
Another important aspect of the process is that it could
theoretically go on forever – you can always improve performance. With each iteration through the loop, you’ll
typically get diminishing returns. At
some point, it will be time to consider performance to be “good enough” and to
move on to something more fun (might I recommend listening to Sy Snootles and
the Max Rebo band in a local cantina?).
Performance Testing Approaches
Monitoring existing systems is all well and good, but what
can you do to avoid potential performance problems for applications that are
yet to be deployed? A common requirement
is to reduce the risk of performance issues when moving from an application
running on a physical server to one running within a VM. Or, you might be planning to deploy a new
application within a VM with little to go on other than a Force-like intuition
that it will work. Regardless of your
midi-chlorian count, you can perform some basic testing to help avoid problems. Figure 3 provides an overview of several
approaches to testing performance characteristics.
Figure 3: Comparing
performance testing approaches
One rather simple type of testing is through the use of
synthetic benchmarks. These hardware
stress tests can be used to determine the absolute performance capabilities of
your hardware and of the VMs that it supports.
For example, you might want to determine the maximum disk throughput you
can achieve from using Direct-Attached Storage (DAS) devices on a host
server. You might find a sustained rate
of 25MB/sec. If you know the performance
requirements of your VMs, you can then safely assume that the server can
accommodate peak disk throughput up to that level. The primary advantage of synthetic benchmarks
is that the tests are easy to run (and repeat), and you can use on a wide
variety of targets. If you decide to
move to an iSCSI storage environment, for example, you can just simply re-run
your tests to measure throughput over the network . (Note, however, that other considerations –
such as latency and the size of the average IO operation might be bigger issues.)
Load testing is the next step up and can require a
significant amount of effort to obtain results.
This approach involves simulating end-user activity on the
application. You might choose to test
the performance of a web server when running directly on physical hardware and
compare the results with the same workload running within a VM (automated P2V
tools can really help simplify the setup process). The drawback is that you must have a relevant
test to use. For simple web applications
and some types of databases, generic benchmarking tools are available. In other cases, you might need to roll your
own or resort to manual load testing (the latter often requires lots of pizza
and IT people with tremendous patience).
The most accurate method of predicting performance is to use
historical information. Assuming it’s
available, you can determine past average resource utilization statistics to
establish a baseline. You can then look
at peaks and determine the amount of resource capacity your servers will
require. Clearly, tracking performance
can provide some huge pay-offs.
Automating Performance Management
If you’ve made it this far, I’ll bet that there’s one
nagging question on your mind: How do I
find the time and resources to manage my entire environment? As much as I depend upon them for operations,
I can’t trust Storm troopers with these types of tasks. Perhaps a fleet of droids would help?
If you’re sold on the value of performance monitoring and
optimization, you can improve operations by investing in automated performance
management tools. Numerous vendors
provide virtualization-aware suites that can help automate tracking and
analysis of resource utilization statistics.
Figure 4 provides a list of some of the features you should look for.

Figure 4: Automating
performance management
This is a fairly lengthy list of features, but a few should
be called out as particularly important.
First, measuring performance as users see it is an important
consideration. No one will care that
overall CPU utilization on a database server is low when they’re having trouble
generating reports. Ideally, a
performance testing product will be able to simulate real end-to-end user
activity like selecting an item, placing an order and receiving a confirmation
via a web-based storefront.
Dynamic resource reallocation can automatically take
corrective actions whenever manual intervention isn’t required. For example, an automated utility can
automatically increase the physical memory allocation for a memory-starved VM
that’s causing lots of paging to occur.
Better yet, VMs can be magically transported between host servers to
rebalance VMs based on their actual
usage patterns (vs. what you originally predicted). The overall goal is to create a fluid
environment that automatically adapts to changing requirements wherever
possible.
Summary
Armed with the appropriate performance monitoring and
optimization approaches, there’s a good chance that The Empire could have been
better managed. Would it have made a
difference in the Clone Wars and in resisting The Rebellion? Perhaps, but that was a long time ago, in a
galaxy far, far away. What about the
datacenters of here and now? Chances
are, you initially used virtualization for server consolidation. Through the use of performance optimization
approaches, you can ensure that you maximize resource utilization while still
meeting business requirements. May the
Force be with you (and your servers).
|