|
Page 1 of 4 Brocade HBA - Stream I/O Performance By Jack Fegreus published: Friday, June 12 2009
Streaming Multiple VM Backups to Minimize RTO
To meet a minimal recovery time objective (RTO), IT needs software that can run multiple VM backup processes in parallel, but before that can happen, server HBAs, such as the Brocade 815 and 825, capable of streaming full duplex I/O at wire speed.
To alleviate the friction between business processes and IT
infrastructure inflexibility savvy CIOs are utilizing virtual machines (VMs) in
a Virtual Operating Environments (VOE), such as that created by VMware to
provide the reliability, availability and scalability of racks of servers,
while reducing capital and operating expenses. Nonetheless, the road to nirvana
has its complications. Virtualization dramatically increases I/O loads on
servers and the expansion of government regulations on risk management
highlights the danger that the failure of a single host server will cascade to
multiple VMs running multiple applications. In this white paper, VSM Labs
explicitly focused on the server I/O problems associated with the
implementation of an end-to-end backup and recovery process within a VOE.
With servers hosting eight or more VMs, an FC HBA can no
longer be regarded as a simple commodity product. As the number of VMs sharing
the HBAs in a host server increases, SAN fan-out becomes a server issue as well
as a switch issue. In a VOE server, HBAs have to play the role of virtual
switches for virtual fabrics created by virtual HBAs assigned via N_Port ID
virtualization. To deal with this issue, Brocade HBAs employ a high-performance
ASIC that supports 500,000 IOPS per port and incorporates an 8-lane Gen 2.0
PCIe interface for 40 Gbit/sec internal server throughput. This level of
performance is particularly important for supporting the high level SAN I/O
throughput needed by end-to-end backup and recovery processes in a VOE.
To optimize resource utilization, sites typically run eight
or more VMs on host servers that utilize multi-core processors. Dense VM
configurations put significant stress on I/O throughput for host servers, which
must virtualize all of the SAN hardware for multiple VMs.
A major issue for VOE backup is the significant I/O overhead
incurred in a process that utilizes VMware Consolidated Backup (VCB). In a
VCB-based backup, all data must be read and written twice: once to a local
directory on a server dubbed the VCB proxy server, and then again to the backup
media. With data being simultaneously read and written in both phases of a
VCB-based backup, achieving optimal efficiency requires a VCB proxy server that
can provide a very high level of I/O throughput.
What's more, VCB can move all data over a SAN using just the
HBAs installed in the VCB proxy server. That puts a premium on the ability to
reach high I/O throughput levels without the need for manual tuning and
intervention by system and storage administrators. With operating costs
dominating capital costs for storage resources, any solution that requires
significant manual configuration or tuning efforts cannot be cost effective.
Within that context, VSM Labs examined the Brocade 815 and 825 HBAs, which are
also available directly through HP as the HP 82B PCIe single and dual port
HBAs.
To provide a baseline for our VOE backup testing, we first
installed three single-port 8-Gbit/sec HBAs, a Brocade 815, a QLogic QLE2560,
and an Emulex LPe12000 on a quad-core HP ProLiant DL360 server running Windows
Server 2003 and the Intel Iometer benchmark. This server would later assume the
role of our VCB proxy server. At the center of our 8-Gbit/sec test fabric, we
configured a Brocade 300 switch.
To provide a complete VOE test environment, openBench Labs
utilized three servers, along with a SEPATON S2100-ES2 virtual tape library
(VTL) with two 4-Gbit/sec FC ports, and a Xiotech® Emprise 5000 storage system
also having two 4-Gbit/sec FC ports. We hosted eight VMs running Windows Server
2003 on a quad-processor HP ProLiant DL580 server running VMware ESX Server 3.5
and managed our VOE from an HP ProLiant DL360 server running VMware vCenter
Server (a.k.a. Virtual
Center) on Windows
Server 2003.
To handle the end-to-end backup process, we installed
Veritas NetBackup with VMware Consolidated Backup (VCB) on a second quad-core
HP ProLiant DL360 server running Windows Server 2003. VCB installs a virtual
LUN driver on a Windows server, dubbed the VCB proxy. In a backup, VCB directs
the ESX host to create a snapshot for each logical volume of a VM. The Windows
server uses the virtual LUN driver to copy the snapshots into a local
directory. As a result, the backup application is able to back up that local
directory containing the copied snapshots and avoid any processing impact on
either the VMs or the ESX server. This is why the VCB proxy server's SAN
connection is so important.
As the capabilities of devices, such as the Brocade HBAs,
advance exponentially, the number of applications able to leverage all of those
capabilities becomes much smaller. The Brocade 815 and 825 8Gbps HBAs are a
perfect example of this trend. These HBAs support the highest levels of
performance for both bandwidth- and IOPS-intensive applications. The vast
majority of standard business applications, however, fall into only one of
those camps.
Virtualization and backup are everyday applications that
require the highest level of I/O bandwidth; however, ultimate IOPS is well
beyond their modest transaction processing needs. Moreover, current multicore
CPUs are very cost effective at hosting VMs; but these CPUs are simply unable
to provide the I/O latency needed to exploit the ability of the Brocade ASIC to
generate 500,000 IOPS.
For backup, the important application-centric metric is full
duplex (simultaneous read and write) throughput. An 8-Gbit/sec HBA must be able
to read data at 8 Gbit/sec and write data at 8 Gbit/sec at the same time.
Equally important is the balance between read and write throughput. In a
backup, the controlling factor in any backup is the slowest rate of the slowest
device.
Using Iometer to generate multiple sequential read and write
streams, we measured near wire-speed full-duplex throughput only with the
Brocade 8Gbps HBAs. Total I/O throughput reached an average of 1,568MB per
second, with sustained reads measured at 786MB per second and sustained writes
measured at 782MB per second.
The read and write I/O rates of the QLogic QLE 2560 were in
balance; however, total throughput was about 12 percent less than the level
measured using the Brocade 815 HBA. In particular, throughput for reads was
688MB per second and 676MB per second for writes. Those results pegged
potential throughput for backup, at 675MB per second, which is just slightly
higher than the maximum throughput sustainable with our VOE test hardware.
When we tested the Emulex LPe12000, there was a significant
deviation between read and write throughput rates that was far more problematic
than the difference in aggregate throughput. As a result, our benchmark
projected that the potential throughput of our backup application in a
configuration employing a single LPe12000 HBA would be limited to about 450 MB
per second. Given those results, VSM Labs set out to explore how well our
benchmarks projected actual performance in two standard backup scenarios.
In our first throughput test, we ran a backup of the eight
VMs on the host ESX server to a disk storage pool on our backup server.
NetBackup ran all VM backups in parallel. Of particular interest was the first
phase of the backup process. In that phase, the HP DL360 server runs in its VCB
proxy role. The server reads the directory of each VM on a shared ESX datastore
and simultaneously writes any snapshot files to a local directory. This process
consistently ran at 500MB per second using the Brocade 815 and the QLogic
QLE2560 HBAs.
In our second test of full duplex throughput, we backed up
the storage pool used in our first test to a VTL that had been configured with
eight logical tape drives. For maximum throughput, NetBackup split that process
into eight I/O streams. As our benchmark projected, the Brocade 815 HBA easily
sustained an aggregate full duplex throughput of 1300MB per second, which was
the I/O streaming limit of our test hardware. Given the results of our
application-centric benchmark and application testing, the Brocade 8Gbps HBAs
will help guarantee any SLA associated with business processes for multiple
application-centric environments, including Virtual Operating Environments, Web
2.0 streaming of rich media, as well as backup and recovery applications.
|