Server Virtualization Storage Based Performance "Gotcha" By Marc Staimer published: Monday, June 09 2008
Server
virtualization has become an irresistible force sweeping into the
world's data centers. With compelling cost and management savings
from server consolidation, server virtualization's future sure
seems secure. Or is it? Everything may not be so completely perfect
in the world of virtualized servers.
It
is not uncommon for system administrators to find stunning
application performance degradation when moving from the physical
world to the virtual one. Invariably the application drop off shows
up after the pilot has moved to production. There is significant
frustration in the efforts to fix it. The problem and the answer are
within the SAN storage.
There
are four definitive bottlenecks that can and will crater virtualized
server application performance if not managed correctly. They
include:
-
Oversubscription
within the virtualized server.
-
Oversubscription
within the HDD and target storage system ques.
-
Oversubscription
within the SAN fabric.
-
Oversubscription
within the target storage ports.
All
four revolve around the concept of oversubscription.
Oversubscription means that the amount of potential bandwidth
assigned to a given port or device is much greater than the bandwidth
available. Oversubscription takes advantage of statistical
probability. It is highly unlikely that all of the users or
applications using that bandwidth will do so at exactly the same
time. This allows for much higher utilization of the assets and
significant cost savings from fewer idle assets. This makes huge
economic sense and has been used everywhere for hundreds of years
including hot bunking in naval vessels, traditional phone systems,
and the Internet. It is a sound concept.
The
downside of oversubscription is the risk that users and applications
will concurrently attempt to use all of the assigned capacity
resulting in much reduced performance. The risks are generally low,
if there is not too much oversubscription. And that's the rub.
The cumulative multiplying effect of each level of oversubscription
dramatically increases the probability of that downside risk. A
deeper examination of each of these oversubscription bottlenecks
shows how.
Oversubscription within the virtualized server
Oversubscription
at the server is how server virtualization works. Too much
oversubscription occurs when there are too many guests and
applications competing for those server resources. One factor that
complicates just how many is too many is the resource intensity of
each application.
A
second factor is the hypervisor's storage virtualization layer.
This is where the LUNs (SCSI logical unit numbers) assigned to the
physical server are carved up by the hypervisor into virtual LUNs.
The assigned target LUN in a traditional SAN storage system is tied
to a specific number of drives in a RAID group (usually no more than
8). Whereas the physical world has unique LUNs for each server, the
virtual server world has multiple virtual machines accessing the same
LUN (meaning same disks) at the same time. This is compounded by
oversubscription at the ques.
Oversubscription within the HDD and target
storage system
Each
HDD has a limited que depth that allows multiple commands to stack up
before a busy signal is sent back to the storage system. The storage
system itself also has a limited que depth before it sends a busy
signal back to the application. The que depth per Fibre Channel or
SAS drive is 256 to 512. The que depth per SATA drive is at most 32
and more often than not 0 (32 requires command queuing in the disk
controller which is atypical.)
This
means that LUN drawn from SATA disk RAID groups are far more likely
to have busy contention than RAID groups with SAS or Fibre Channel
disks. Even then there can be disk contention if there is a high
number of IO or throughput intensive guests on the hypervisor.
Oversubscription within the SAN fabric
SANs
are by design oversubscribed. Best practices call for an average of
8:1 initiators from servers to target ports on storage. Higher IO or
throughput intensive application servers require a lower
oversubscription ratio. Lower IO or throughput intensive application
servers can have a much higher oversubscription ratio.
When
physical application servers are consolidated through server
virtualization and if the SAN is not re-architected to reflect
virtual server oversubscription, there will be a much higher
probability of application performance cratering. Poorly engineered
SAN fabric oversubscription will lead to significant fabric blocking.
Oversubscription within the target storage ports
Just
as too much oversubscription within the SAN fabric can cause blocking
that substantially reduces application to storage performance, so too
can too much oversubscription to the target storage ports.
Conclusion
Oversubscription
is not a bad thing and in fact is very useful in increasing asset
utilization and reducing costs. Unfortunately, just like a cliché,
too much oversubscription leads to bad consequences.
My
next column will discuss methodologies and solutions that help
alleviate these bottlenecks.
Related Links:
Virtual Thread Server Virtualization
Marc Staimer is president and CDS of Dragon Slayer Consulting in
Beaverton, Oregon. He is widely known as one of the leading storage
market analysts in the network storage and storage management
industries. His consulting practice of 6 + years provides consulting to
the end-user and vendor communities. Most of his consulting is in the
areas of strategic planning as well as product and market development.
Staimer's 23 years of marketing, sales and business experience in the
storage, software and systems industries, combined with his years of
research into the MIS community, give him unique business, systems and
market expertise.
|