Brocade HBA - Stream I/O Performance
Brocade HBA - Stream I/O Performance
By Jack Fegreus
published: Friday, June 12 2009


BrocadeStreaming Multiple VM Backups to Minimize RTO

To meet a minimal recovery time objective (RTO), IT needs software that can run multiple VM backup processes in parallel, but before that can happen, server HBAs, such as the Brocade 815 and 825, capable of streaming full duplex I/O at wire speed.

 

To alleviate the friction between business processes and IT infrastructure inflexibility savvy CIOs are utilizing virtual machines (VMs) in a Virtual Operating Environments (VOE), such as that created by VMware to provide the reliability, availability and scalability of racks of servers, while reducing capital and operating expenses. Nonetheless, the road to nirvana has its complications. Virtualization dramatically increases I/O loads on servers and the expansion of government regulations on risk management highlights the danger that the failure of a single host server will cascade to multiple VMs running multiple applications. In this white paper, VSM Labs explicitly focused on the server I/O problems associated with the implementation of an end-to-end backup and recovery process within a VOE.

 

With servers hosting eight or more VMs, an FC HBA can no longer be regarded as a simple commodity product. As the number of VMs sharing the HBAs in a host server increases, SAN fan-out becomes a server issue as well as a switch issue. In a VOE server, HBAs have to play the role of virtual switches for virtual fabrics created by virtual HBAs assigned via N_Port ID virtualization. To deal with this issue, Brocade HBAs employ a high-performance ASIC that supports 500,000 IOPS per port and incorporates an 8-lane Gen 2.0 PCIe interface for 40 Gbit/sec internal server throughput. This level of performance is particularly important for supporting the high level SAN I/O throughput needed by end-to-end backup and recovery processes in a VOE.

 

To optimize resource utilization, sites typically run eight or more VMs on host servers that utilize multi-core processors. Dense VM configurations put significant stress on I/O throughput for host servers, which must virtualize all of the SAN hardware for multiple VMs.

 

A major issue for VOE backup is the significant I/O overhead incurred in a process that utilizes VMware Consolidated Backup (VCB). In a VCB-based backup, all data must be read and written twice: once to a local directory on a server dubbed the VCB proxy server, and then again to the backup media. With data being simultaneously read and written in both phases of a VCB-based backup, achieving optimal efficiency requires a VCB proxy server that can provide a very high level of I/O throughput.

 

What's more, VCB can move all data over a SAN using just the HBAs installed in the VCB proxy server. That puts a premium on the ability to reach high I/O throughput levels without the need for manual tuning and intervention by system and storage administrators. With operating costs dominating capital costs for storage resources, any solution that requires significant manual configuration or tuning efforts cannot be cost effective. Within that context, VSM Labs examined the Brocade 815 and 825 HBAs, which are also available directly through HP as the HP 82B PCIe single and dual port HBAs.

 

To provide a baseline for our VOE backup testing, we first installed three single-port 8-Gbit/sec HBAs, a Brocade 815, a QLogic QLE2560, and an Emulex LPe12000 on a quad-core HP ProLiant DL360 server running Windows Server 2003 and the Intel Iometer benchmark. This server would later assume the role of our VCB proxy server. At the center of our 8-Gbit/sec test fabric, we configured a Brocade 300 switch.

 

To provide a complete VOE test environment, openBench Labs utilized three servers, along with a SEPATON S2100-ES2 virtual tape library (VTL) with two 4-Gbit/sec FC ports, and a Xiotech® Emprise 5000 storage system also having two 4-Gbit/sec FC ports. We hosted eight VMs running Windows Server 2003 on a quad-processor HP ProLiant DL580 server running VMware ESX Server 3.5 and managed our VOE from an HP ProLiant DL360 server running VMware vCenter Server (a.k.a. Virtual Center) on Windows Server 2003.

 

To handle the end-to-end backup process, we installed Veritas NetBackup with VMware Consolidated Backup (VCB) on a second quad-core HP ProLiant DL360 server running Windows Server 2003. VCB installs a virtual LUN driver on a Windows server, dubbed the VCB proxy. In a backup, VCB directs the ESX host to create a snapshot for each logical volume of a VM. The Windows server uses the virtual LUN driver to copy the snapshots into a local directory. As a result, the backup application is able to back up that local directory containing the copied snapshots and avoid any processing impact on either the VMs or the ESX server. This is why the VCB proxy server's SAN connection is so important.

 

As the capabilities of devices, such as the Brocade HBAs, advance exponentially, the number of applications able to leverage all of those capabilities becomes much smaller. The Brocade 815 and 825 8Gbps HBAs are a perfect example of this trend. These HBAs support the highest levels of performance for both bandwidth- and IOPS-intensive applications. The vast majority of standard business applications, however, fall into only one of those camps.

 

Virtualization and backup are everyday applications that require the highest level of I/O bandwidth; however, ultimate IOPS is well beyond their modest transaction processing needs. Moreover, current multicore CPUs are very cost effective at hosting VMs; but these CPUs are simply unable to provide the I/O latency needed to exploit the ability of the Brocade ASIC to generate 500,000 IOPS.

 

For backup, the important application-centric metric is full duplex (simultaneous read and write) throughput. An 8-Gbit/sec HBA must be able to read data at 8 Gbit/sec and write data at 8 Gbit/sec at the same time. Equally important is the balance between read and write throughput. In a backup, the controlling factor in any backup is the slowest rate of the slowest device.

 

Using Iometer to generate multiple sequential read and write streams, we measured near wire-speed full-duplex throughput only with the Brocade 8Gbps HBAs. Total I/O throughput reached an average of 1,568MB per second, with sustained reads measured at 786MB per second and sustained writes measured at 782MB per second.

 

The read and write I/O rates of the QLogic QLE 2560 were in balance; however, total throughput was about 12 percent less than the level measured using the Brocade 815 HBA. In particular, throughput for reads was 688MB per second and 676MB per second for writes. Those results pegged potential throughput for backup, at 675MB per second, which is just slightly higher than the maximum throughput sustainable with our VOE test hardware.

 

When we tested the Emulex LPe12000, there was a significant deviation between read and write throughput rates that was far more problematic than the difference in aggregate throughput. As a result, our benchmark projected that the potential throughput of our backup application in a configuration employing a single LPe12000 HBA would be limited to about 450 MB per second. Given those results, VSM Labs set out to explore how well our benchmarks projected actual performance in two standard backup scenarios.

 

In our first throughput test, we ran a backup of the eight VMs on the host ESX server to a disk storage pool on our backup server. NetBackup ran all VM backups in parallel. Of particular interest was the first phase of the backup process. In that phase, the HP DL360 server runs in its VCB proxy role. The server reads the directory of each VM on a shared ESX datastore and simultaneously writes any snapshot files to a local directory. This process consistently ran at 500MB per second using the Brocade 815 and the QLogic QLE2560 HBAs.

 

In our second test of full duplex throughput, we backed up the storage pool used in our first test to a VTL that had been configured with eight logical tape drives. For maximum throughput, NetBackup split that process into eight I/O streams. As our benchmark projected, the Brocade 815 HBA easily sustained an aggregate full duplex throughput of 1300MB per second, which was the I/O streaming limit of our test hardware. Given the results of our application-centric benchmark and application testing, the Brocade 8Gbps HBAs will help guarantee any SLA associated with business processes for multiple application-centric environments, including Virtual Operating Environments, Web 2.0 streaming of rich media, as well as backup and recovery applications.