Virtualization's Impact on IT Operations - Part Two By Kevin Lees published: Tuesday, July 01 2008
In the first article of this series,
I explored some advantages virtualization provides to IT Operations. Those
advantages were seen through the eyes of someone responsible for IT system
operations in a pre-virtualization environment at an ASP, namely me, and
focused on how these advantages would positively impact Service Level
Management. In this, as well as the next, and final, article in this series, I'll
look at some of the challenges virtualization presents to IT Operations during
its pursuit in relieving the seemingly "Atlas Carrying the Weight of the World
on His Shoulders"-like pressure of Service Level Management. The present
article addresses what I see as some of the underlying Service Level Management
challenges presented by virtualization as well as the tools available to begin
addressing them. The last article in the series will look at addressing
additional virtualization challenges presented to IT Operations from an ITIL perspective,
thereby from primarily a process view.
To begin, what are the overarching
challenges virtualization thrusts upon IT Operations and Service Level
Management? First there is the added complexity of virtualization. Now, in addition
to physical servers, IT Operations must contend with multiple virtual machines
per physical server and their required resources, not to mention keeping track
of resource pool constraints across multiple physical servers. And, not only is
the server infrastructure affected by virtualization, but what about storage
virtualization (not to mention the attention I/O virtualization will require in
the near future)? Second, there is the increasingly dynamic nature of
virtualization. Between the ability to manually move "live" virtual machines
between physical servers and having them automatically moved in response to
resource needs or physical server problems, how can you monitor and manage
something if you don't even know where it physically resides? Additionally, how
does the virtual infrastructure relate to and impact the rest of your IT
infrastructure? Lastly, there is the challenge created by the ease with which
virtual machines can be generated and deployed. You want to be responsive to
changing business needs; you may even want to put control of virtual machine
deployment in the hands of the department whose business function it specifically
provides (don't panic; smelling salts help!) but what havoc might that wreak on
your infrastructure? How can IT Operations begin to address these challenges?
Let's start by defining four key
aspects of Service Level Management: proactively planning to prevent problems
before they occur; quickly and efficiently provisioning systems into service in
response to changing business needs; monitoring the infrastructure to detect
problems before the users do; and troubleshooting infrastructure performance
issues to minimize unplanned downtime. While each of these exists when dealing
with a purely physical server environment, they are compounded by
virtualization. Let's address the "why" as well as the tools available to begin
addressing each, in turn. Before I begin, though, I'd like to provide a
disclaimer. While I try to be virtualization vendor agnostic, many of the tools
I'll identify in this article provide solutions for a VMware-based environment
only. Some do address other vendors' solutions (Citrix's XenServer and
Microsoft Virtual Server specifically), but those who don't currently say they
will support other virtualization vendor technologies as the market need
arises. That said, I'll identify extra-VMware vendor support where appropriate.
When I refer to proactive planning I'm
not talking about the massive planning efforts involved in consolidating an
entire datacenter. Don't get me wrong, these large scale planning efforts are obviously
important. And, fortunately there are excellent tools becoming available on the
market to plan an entire datacenter consolidation. For instance Cirba's Data
Center Intelligence software which does a detailed analysis of the entire
environment taking into account not only workload constraints but technical and
business constraints as well. But the proactive planning I have in mind is that
day-in, day-out, project-in, project-out server virtualization planning facing
IT Operations. The type of planning (aka capacity planning) needed to answer
the question of which ESX or Xen server on which to place that new business
application virtual machine and still maintain needed resource reserves. The
kind of planning that, if done consistently, prevents resource problems and
unplanned downtime from occurring at the least opportune time (not that there
is EVER an opportune time).
Wouldn't it be nice if you could
perform "what if" scenarios to "test" virtualization infrastructure changes to
optimize a virtual machine's placement so as to predict the best use of
available resources prior to implementation? Well, you don't have to wait any
longer as tools like Profiler from Tek-Tools Software, Inc., HP's Insight
Dynamics - VSE, Akorri's BalancePoint Performance Dynamics ModelingTM, and BMC's Performance Assurance solution provide the
planning assistance needed to understand the impact a virtual machine will have
on resource capacity. Using real-time and historical data, these tools help
determine the best placement for your new virtual machine(s) workload. But be
careful; while these tools address workload impact on the resources of a single
physical server hosting virtual machines, they do not yet take into account cross
server resource pool constraints.
As to the second key aspect (in my
opinion, at least) of Service Level Management - quickly and efficiently
provisioning systems into service in response to changing business needs virtualization
adds the challenge of being able to quickly and efficiently provision virtual
machines perhaps a bit too easily. Once you have established templates, or by
making use of virtual machine cloning, vendor tools like VirtualCenter and
XenCenter make controlling virtual machine provisioning a concern to say the
least. If the decision has been made to let a business unit provision their own
virtual machines, many believe it may become IT Operations' bête noire.
I believe the resulting concern with
unrestrained virtual machine provisioning leading to the slow death of IT
Operations due to virtual sprawl should be addressed via process (Change Management
specifically, which I'll come back to in the third article in this series).
Vendors recognize the importance of wrapping a process around provisioning and
tools are making their way to the market to address this. At least three tools
are available to control the provisioning process: VMware's LifeCycle Manager,
BMC's Virtualization Manager, and Opalis' Integration Server. LifeCycle Manager
and Virtualization Manager are pretty much self-contained in that all of the
required tools, along with a workflow engine, are contained in the product. If
you prefer to integrate with third party or existing tools, Opalis' Integration
Server provides a middleware approach of sorts which allows you to integrate
third party products with Opalis' workflow engine to create an automated
provisioning process. Most importantly each incorporates an approval step prior
to a virtual machine being provisioned. With these tools you could,
theoretically at least, allow a business unit to manage the lifecycle of their
own virtual machines, within the resource constraints defined by IT Operations,
without the fear of virtual sprawl running amok in the datacenter.
Ok, I've attempted to identify tools
to address the Service Level Management challenges of planning and
provisioning, what about monitoring the infrastructure to detect problems
before they affect the users? There are certainly adequate tools, typically
those provided by the vendors, to monitor the virtual infrastructure, but what
about the challenge of integrated monitoring of the entire infrastructure of
which the virtual infrastructure is but a component? At a bare minimum, such an
integrated monitoring tool needs to provide event correlation, but what about
the added challenge presented by the dynamic nature of virtual machines
movement? Knowing an event has occurred is valuable, but how about subsequent
troubleshooting when the location of the virtual machine generating the event
is fluid? Tracking virtual machines that can be easily moved manually is one
thing but add to that those virtual machines that are dynamically moved, as
with VMware's DRS product, and troubleshooting an event could become even more
time consuming.
For integrated, "single-pane-of-glass"
event monitoring / correlation of the physical and virtual infrastructures,
vendors are again rising to the challenge. All of the "big name" vendors have enhanced
their operations management solutions to include the virtualization
infrastructure. This includes: HP's Operations Manager; IBM with Tivoli
Monitoring for Virtual Servers; BMC Software with Performance Manager for
Virtual Servers; Computer Associate's UniCenter; and Microsoft's System Center
Operations Manager. All support VMware-based environments. In addition, BMC supports
Citrix and Microsoft's virtualization offerings. Aside from the big-name
operations management vendors, other vendors with integrated event management
of the physical and virtual infrastructures include Tek-Tools with their
Profiler product, Avocent's DSView3 and, for a good, lower-cost solution, Woodstone's
Server's Alive has added VMware-support. To address the fluidity virtual
machine placement, HP and Microsoft can dynamically track virtual machine locations
via nWare's SPI for VMware.
The final, key aspect of Service
Level Support I mentioned was troubleshooting performance issues in an
infrastructure to minimize user dissatisfaction or, in a worst case scenario,
unplanned downtime. This obviously applies to a purely physical environment,
but is complicated in a virtualized environment by the existence of multiple
virtual machines on a physical server, contending for shared storage, memory,
CPU and network resources. Multiply this by having a service supported by
multiple virtual machines on multiple physical servers sharing, perhaps, virtualized
storage and you begin to see the performance troubleshooting challenge.
In my experience, performance
troubleshooting, like programming, is an art. While it can be codified to an
extent, those people who are really good at performance troubleshooting are not
only very experienced, but seem to have some innate ability that the rest of us
mere mortals can only wish for. Wouldn't it be nice to be able to easily
pinpoint performance bottlenecks in an end-to-end virtualized environment,
i.e., virtual machine to physical server to storage, so the rest of us could
experience what it feels like in that rarefied air? Welcome to the world of
cross domain analysis. In this context, cross domain analysis refers to applying
advanced analytics to bridge the gap between the performance and availability
of virtual, and associated physical, infrastructure components, and resource
capacity consumption.
Employing such a capability means
that the tool can determine where a performance bottleneck, or resource
contention, is occurring in the virtual machine / physical server / storage
path of the infrastructure; providing for quicker problem identification and
resolution. Vendor's providing cross domain analysis tools include Akorri with
BalancePoint Cross-Domain AnalysisTM,
Tek-Tools Software with their Profiler Software Suite, and BMC's ProactiveNet
Analytics which is part of their Services Assurance family of solutions. This
would seem to be a "must-have" capability in any large scale, virtualized
production environment.
In closing, as the placement of
virtual machines on physical servers can be fluid, so is the Service Level
Management tool market. What I've attempted to provide is merely a snapshot in
time of currently available tools. Rest assured that, given the speed with
which virtualization is becoming an IT mainstay, planning and management tools
will be brought to market at an increasing (if not exponential) rate. So far,
the market seems to be dominated by the traditionally big players - HP, IBM,
CA, BMC), but the newer players like Tek-Tools Software and Akorri portend what
I believe will be a plethora of solid, focused management tool vendors.
Related Links:
PMP'n Part One, PMP'n Part Two, Impact on IT Operations - Part One
Kevin is the principal consultant at Premier Project Management, LLC,
where he specializes in IT infrastructure architecture development as
well as planning and managing IT infrastructure, virtualization, and
data center consolidation/relocation projects. He is a Project
Management Professional and VMware Certified Professional with 26+
years of technical, management, and consulting experience in systems
integration, project management and IT operations. Recent engagements
include performing an enterprise-wide IT Infrastructure &
Operations assessment as well as planning and managing a
multi-datacenter consolidation / relocation, using virtualization, in
the publishing industry; managing the implementation and operational
"go-live" of two e-mail platforms for an international e-mail ASP; and
providing technical project management and virtualization services for
the assessment phase of a multi-datacenter consolidation / relocation
project in the on-line, e-mail marketing service provider space. Kevin
can be reached at
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
.
|