Compellent SIR Print E-mail
Article Index
Compellent SIR
VSM Labs Scenario
Replay Progression
Scheduling Replays
Creating a New Volume
The Server Instant Replay (SIR) Wizard
Creating the VM
The DR Scenario
Replays to a Remote Storage Center
Importing the New Volume
Disaster Recovery Planning
By Jack Fegreus

published: Monday, February 25 2008

compellent_logo.gif Simplifying Virtual Server Management

Through the construct of a pointer-based Replay, a Compellent SAN provides a fast mechanism with minimal I/O processing overhead to clone disk volumes, which is particularly useful for cloning bootable OS volumes for virtual machines.

In the 2006 McKinsey survey of senior IT executives, system and storage virtualization shared the spotlight as the key strategies for cutting operational costs. These IT executives find themselves faced with managing large server farms that grew out of the need to isolate mission-critical business applications in order to ensure the performance and scalability of those applications. While that strategy succeeded in delivering application performance and scalability, it left IT struggling to deal with a new issue: resource optimization. At too many sites, over-provisioning of resources led to resource-utilization rates that hover around 10-to-20 percent.

 Many IT decision makers now see system virtualization as a silver bullet for driving up resource utilization rates without negatively impacting the reliability, availability, and serviceability (RAS) provided by their existing server farms. In the Symantec State of the Data Center 2007, a plurality of IT decision makers chose server virtualization followed by consolidation as the best cost containment strategies to cut data center costs. Moreover, by a 3-to-1 margin, sites implementing server virtualization were choosing to set up a VMware Virtual Infrastructure (VI) environment.

To get the maximum value from a VM, IT must avoid any constraints that bind that VM to a physical server. First and foremost, there will be the need to handle load balancing and failover of virtual machines. In addition, there will be the need to move VM configurations in and out of development, test, and production environments. That means all virtual machines on all physical hosts must be capable of accessing the same storage resources and that makes a storage area network (SAN) essential.

For storage resources, separating logical functionality from the constraints of physical implementation starts with the adoption of a SAN. That starting point, however, often leads to a very complex rather than a very simple environment as silos of technology burden rather than relieve resource management. To avoid that pitfall, Compellent markets Storage Center as a complete modular SAN solution that encompasses both Fiber Channel and iSCSI connectivity and not as a SAN component.

What at first glance appears to be a retro marketing strategy is actually driven by a very advanced virtualization construct that results in a very remarkable value proposition. The hallmark of the Compellent Storage Center is a dramatic reduction in SAN TCO garnered through the automation of an astonishing number of storage management tasks. To reach that level of storage-management automation, the Compellent Storage Center radically restructures the way storage is virtualized. Traditional SAN software virtualizes storage based on partitions of physical RAID volumes: Compellent Storage Center virtualizes storage based on disk blocks in a scheme dubbed Dynamic Block Architecture.

Starting with the most basic functionality of a JBOD (just a bunch of disks) folder, Storage Center creates a virtual pagepool of disk blocks. In so doing, Compellent generates a rich collection of metadata. Each logical disk block is associated with a collection of metadata tags that represent notions that are normally associated with file-level and volume-level storage constructs.

File-oriented metadata includes such notions as data type and time stamps for events, such as data block creation, last access, and last modification. Volume-oriented metadata includes the type of disk drive, the associated disk tier, the underlying RAID level, and the corresponding logical volume. The result of this level of virtualization is a powerful synergy within a VI environment, which is uniquely capable of leveraging a SAN.

Commercial operating systems, such as Microsoft Windows and Linux, assume exclusive ownership of their storage volumes. As a result, neither Windows nor Linux incorporates a distributed file locking mechanism in its file system. Without a DLM, virtualization of volume ownership is the only means of preventing the corruption of disk volumes through inadvertent volume sharing.

On the other hand, the file system for VMware ESX server, dubbed VMFS, has a built-in mechanism with which to handle distributed file locking. What's more, VMFS avoids the massive overhead typically incurred by a DLM by treating each disk volume as a single-file image-similar to the way an ISO-formatted CDROM is handled. When a commercial OS in a VM mounts a disk, ESX opens a disk-image file; VMFS locks that image file; and the VM's OS gains the exclusive ownership to all of the files contained in the disk volume image as expected.

When leveraging virtual servers in server consolidation projects, a SAN based on Dynamic Block Architecture can also generate considerably more cost-avoidance savings. What makes these savings possible are a number of advanced features that center on Data Instant Replay, the Storage Center application that introduces the construct of a Replay. Like a typical snapshot, a Replay represents a data volume at a particular point-in-time; however, for a Replay, that point-in-time is a virtual point-in-time,

The three dominant snapshot technologies are copy-on-write, redirect-on-write, and split mirror. In each of these schemes, data must be written for the snapshot as soon as the snapshot is created: Whether or not the snapshot is used is irrelevant. In contrast, only a minimum amount of metadata is required when a Replay is created: Actual data is never written until the Replay is mapped to a server as a logical volume and put into use.

The most prevalent snapshot technology is copy on write, which is used by the Linux Logical Volume Manager. When a copy-on-write snapshot is created, metadata is written to the snapshot about the location of the original data. The snapshot then tracks writes that change data blocks belonging to the original volume. Before a write can change a block, a copy of the original block is copied to the snapshot. As a result, every write to the original volume will now require two writes: The first write preserves the original data by copying it to the snapshot and the second write updates the original data.

Redirect-on-write, which is used by NetApp Filer appliances, is similar to copy-on-write; however, this snapshot scheme does not incur the double write overhead penalty. In the redirect-on-write scheme, new writes to the original volume are redirected to a new location set aside at the creation of the snapshot. Since the original data is not being overwritten, only one write is necessary.

While the double write penalty is avoided this scheme is complicated by its use of the use of the original volume as a logical snapshot as the snapshot location now contains all of the original volume's updates. As a result, when a snapshot is deleted or automatically expired, there is a new overhead penalty, as the data in the snapshot location must be reconciled back into the original volume. Moreover, that process grows in complexity as the number of snapshots increases and the working dataset becomes more fragmented.

Employed by EMC Symmetrix storage arrays, the split mirror scheme creates a physical clone of a volume. The entire contents of the original volume are copied onto a synchronized mirror volume. Storage administrators can make clones instantaneously available by "splitting" a mirror. This snapshot method requires as much storage space as the original data and imposes the overhead of writing data synchronously to the original volume and its mirror copy.

In stark contrast, Compellent Data Instant Replay leverages the logical block implementation and the volume-oriented metadata of Dynamic Block Architecture to provide the benefits of all three traditional snapshot techniques, while avoiding all of their limitations. In particular, a Replay preserves only pointers to blocks that have changed since a prior Replay. As a result, the amount of storage utilized and the required level of I/O processing are both minimal and there is no limit on the number of point in time copies that can be handled. More importantly, an automated Replay can be scheduled via a Replay template as frequently as is necessary.

In particular, Replays first freeze the data blocks for a point in time as read-only and establish metadata pointers to that data, which is similar to a copy-on-write snapshot. By freezing the original data blocks as read only, however, there is no need to copy those blocks when data is altered. That task is also handled by pointers. As a result, Replays do not incur the copy-on-write overhead of having to make two physically I/Os on an update to the original data file once a snapshot is initiated.

Through the use of pointers, Replays logically redirect data updates to the original volume data much like the redirect-on-write scheme. Nonetheless, by using logical pointers rather than creating physical block regions for the updates, there is no file fragmentation. What's more, Replays can be deleted or expired without having to resynchronize the data as in a redirect-on-write snapshot.

Finally, since replays only contain pointers to data, converting a Replay into what Compellent dubs a "View Volume," which can be mounted and utilized by a SAN client system, is essentially instantaneous. That makes the process of creating and mounting a View Volume at least as fast as the process of breaking and mounting a split-mirror snapshot. More importantly, there is no need to pause I/O processing before invoking the creation of a View Volume as it is when breaking a mirror.

More importantly, that process of creating a View Volume from a Replay can be further applied to a very important case for server consolidation projects for virtual and physical servers alike. By providing for centralized server booting via SAN-based disk volumes, IT can cut capital expenses by eliminating the need for any internal server disks. This opens the door to a number of potential hard- and soft-cost savings.

The savings start with the ability to implement low-cost diskless or blade servers, which do not require RAID-enabled HBAs or high-end power supplies. This helps to lower the power, cooling, and space needed for servers significantly. In addition, server maintenance contracts can be downgraded, since the OS, the applications, and the data are now all independent of the physical server and its health. Furthermore, labor-intensive storage and system administration tasks are simplified with all boot images physically separated from the servers and managed from a single centralized SAN console.

Server Instant Replay integrates with Storage Center and Data Instant Replay and helps automate the very subtle and complex process of cloning an OS volume. The Server Instant Replay wizard carefully guides a server administrator through the process of creating a new boot volume from an existing OS boot volume, with particular attention paid to mapping that new volume in a way that allows a server to boot from the volume over the SAN. In a VMware VI environment, Server Instant Replay can be used to extend the capabilities of the basic VI Client or enhance the template functionality of Virtual Center.  

Use of the Server Instant Replay wizard not only reduces the amount of time required to deploy and provision a server OS, it also ensures that the task is performed in the same way every time it is performed. Moreover, a server administrator can carry out the entire task of provisioning a server with a new boot volume quickly and accurately without any assistance from a storage administrator. Whether deploying one or multiple servers, the use of Server Instant Replay saves significant labor costs when compared to typical local setup or Boot from SAN processes. What's more, streamlining server provisioning and recovery cuts both server and storage management time and increases the capabilities and productivity of server and storage administrators.

Another very important role for the Server Instant Replay application is in a DR scenario. Just as global operations, strict governmental regulations on records retention, and the new focus on eDiscovery in civil litigation have changed the nature of backup, so too have these forces altered the fundamental constructs of recovery.

The old notion of data recovery from the previous day's backup is in no way sufficient to satisfy a number of regulatory requirements. For that reason, IT must now plan for the recovery of applications along two dimensions: the time that can elapse before the application is back online-dubbed the Recovery Time Objective (RTO), and the amount of data that must be recovered-dubbed the Recovery Point Objective (RPO). Moreover, IT recovery costs rise as the time window of the RTO shrinks and as the acceptable amount of data that must be recovered for the RPO grows larger.

In dealing with the dynamics of DR, the importance of Server Instant Replay again arises out of its integration with the core Storage Center functions and other Storage Center applications. In a DR scenario, the key application is Remote Instant Replay.

The scheme is remarkably clever in its simplicity: In a process dubbed Thin Replication, Remote Instant Replay replicates Replays on remote systems. The process can be set up to use either synchronous or asynchronous communications. Independently of that synchronization choice, Thin Replication follows the tenets of Dynamic Block Architecture by sending only written data and excludes any allocated but unused space on a volume. Following the initial synchronization process, only changed data is transmitted.

As a result, the costs of DR contingency planning can be much more easily controlled. Thin Replication creates a copy of the replicating system's actual data along with an unlimited number of replays on the remote connection. There is no need to provision standby systems with configurations that are identical to critical production systems. What's more, Thin Replication helps optimize both dimensions of a DR strategy via low-overhead Replays that do double duty as RPO and RTO checkpoints.

The unlimited granularity of Replays enables more recovery points. Sites can then use that granularity to extend the use of asynchronous replication and still ensure meeting a demandingly high RPO. That allows them to avoid the I/O burden of a two-phase commit, which occurs with synchronous replication. Under the Remote Instant Replay scheme, a Replay created by Data Instant Replay on the replicating system is sent intact to the remote connection. These Replays now server as re-synchronization checkpoints to reduce the amount of data needed to be transferred from the local system to the remote system in the event of a communication failure. What's more, the low impact of Thin Replication combined with asynchronous communications provides IT with the flexibility to utilize Replays as multiple recovery-point locations.

The Replay checkpoints copied to the remote connection system also serve as remote recovery points in the event the data must be recovered on the remote connection system in the event of a disaster. By combining Server Instant Replay with Remote Instant Replay, Compellent can dramatically lower the RTO objective without the cost and complexity of traditional snapshot schemes. In a traditional snapshot scenario, IT must replicate point-of-failure logs along with snapshots. Then in the event of a disaster, IT must run those logs against the recovered snapshots as part of the recovery process in order to meet a high RPO.

On the other hand, Server Instant Replay works as fast and efficiently on replicated Replays as it does on local Replays. As a result, Server Instant Replay can be used to recover a volume in a DR scenario in a matter of minutes from any previous Replay checkpoint. What's more, just as on a local server, there is no need to break a mirror or pause any I/O processing in order to invoke the process of creating a View Volume and mapping it to a server. That means IT is now free to test its DR plans as frequently as is necessary in order to meet any unique business continuity constraints.

The bottom line for disaster recovery is that it's a matter of when, not if. Whether the problem comes about through a fast-spreading Internet Warhol worm, a natural disaster, or an external event, such as a major accident or a labor strike, a disaster will happen and the impact on business continuity must be minimized in the most cost-effective manner.

More importantly, the group of tasks associated with recovering a server in a DR scenario is a microcosm for the group of tasks that occur regularly in a large-scale Virtual Infrastructure environment. In particular, running an application on a dedicated VM is ranked a best practice for IT for enhancing reliability and availability. Tactically, that strategy requires IT to utilize OS templates when provisioning a new VM. Via the Server Instant Replay wizard, IT can automate that provisioning process to make it repeatable and cost effective.




 
< Prev   Next >