5 Reasons Your Disaster Recovery Won’t Work

Peter Eicher (Profile)
Thursday, June 20th 2013

Disaster recovery (DR) operations for IT organizations have long been a challenge. Numerous issues of cost and complexity continue to make DR problematic and, most importantly, unreliable. This article will discuss five common reasons why the DR plan you think is going to work may not when you need it most.

Because there are many ways to implement disaster recovery, we will briefly discuss four of the most common methods to provide common reference points for discussion.

Four Common DR Methods

While not an exhaustive list, the following four methods cover the majority of disaster recovery implementations used today. Keep in mind that many organizations use more than one of these:

  • Tape Shipment: Creating backup tapes and sending them offsite is the oldest DR technique. While well understood and familiar, it remains slow, expensive and exposes your organization to the risk of lost or stolen tapes.
  • Server-to-Server Replication: An excellent method for rapidly recovering individual systems, this technique is difficult to implement at scale and doesn’t offer good options for longer term data retention.
  • Storage Array Replication: Replicating data from primary storage arrays is a reliable and long used method, and many vendors have well proven implementations. However, recovery workflows are often manual, complex and subject to human error, and each vendor has its own toolset. This method is also costly.
  • Backup Disk Appliance Replication: Consolidating backups onto a single disk appliance reduces management complexity, but recovery workflows remain dependent on whatever backup software you are using. Tape-like streaming data restores remain slow for large volumes of data.

Five Reasons Your DR Won’t Work

Each of the four methods described above has strengths and weaknesses. We will consider each of these four methods in the context of five reasons your DR won’t work. In some cases, a particular method actually will work. The trick will be - can we find a single method that will work across all five of our problem categories?

Reason #1: Your DR Is Too Slow

Every DR plan has a timeline behind it, and that timeline is the survival of your business. How many hours or days can your systems be down before your business is damaged beyond repair? The answer will vary for every organization, but fast recovery times are critical for any DR plan.

In terms of recovery speed, both tape and backup appliances use data streaming, meaning data must be copied back to another system (e.g. a server or a disk array) before it can be used. Because data cannot be accessed directly, both of these methods are too slow for many organizations.

Server replication provides very effective recovery times since the target server can usually just be flipped on. But the management complexities of this method – needing to touch every application – make its usefulness limited.

Disk array replication gives you rapid access to data in an immediately useable format, but access to data isn’t enough. How are you recovering applications and system data? Those issues will need to be addressed outside of the disk array itself. This creates a lot of complexity and extends recovery times greatly, even though data is available.

Conclusion: All of these methods are flawed. Server replication is perhaps the best solution for a smaller organization or for key servers, but it will not scale well.