IT Recovery Automation: The Solution to Short and Long Term Outages

Kaushik Ray (Profile)
Wednesday, December 23rd 2015

Mention “IT outage” and thoughts turn to super storms -- hurricanes, tornadoes or some other natural disaster causing widespread havoc with your critical data. Typically, though, what causes a data disruption is much more mundane. For instance, a power failure or a hardware glitch. Or (you can’t make this up), a squirrel, a dropped anchor, or burning cigarette butt left in the wrong place. They’re unusual, to be sure, but they still result in disruptions in IT operations.

Grasping this, chief information officers at many businesses – especially enterprises – are recognizing that their disaster recovery programs constitute more than just IT insurance. While they often spend heavily to establish duplicate data-storage systems, often located far from the main IT databank, so they can recover their invaluable data swiftly should a calamity strike, they also realize that few IT outages – 14 percent, by one estimate (DR Benchmark study)  -- are weather-related.

Consequently, more CIOs recognize they must be resilient when any outage occurs and ensure their organizations can access the applications and services to run their business at peak performance. These CIOs, especially at enterprises, are focusing much more heavily on automating their IT processes and infrastructure, especially as they move more and more of their data to the cloud and virtualized environments.

The two aren’t interchangeable although the technologies are similar. Virtualization is software that makes it possible to run multiple operating systems and applications on the same server at the same time. It is the fundamental technology powering cloud computing. As for cloud computing, while it is software that manipulates hardware, it is a service that results from that manipulation and delivers shared computing resources, software or data.

IT interruptions and failures do occur in the cloud, dispelling the myth that the cloud is invincible. And when they occur, they can prove very serious because cloud services often serve far more people than locally run operations. So they attract great attention.

In 2014, for instance, notable IT outages in the cloud disrupted operations at Dropbox twice in two months, Google three different times, Samsung’s Smart TV for 4½ hours, Adobe for about 28 hours and Microsoft twice in two days, among others. And none of these incidents were sparked by a natural disaster such as the devastating Hurricane Sandy that disrupted hundreds of IT operations along the northern Atlantic Coast region in 2012.

Most Common Causes of IT Outages

What are the most common causes of IT outages other than from weather-related and other natural disasters? In 2013, the Ponemon Institute and Emerson Network Power updated a previous study of data center outages, including the root causes. The most frequently cited root causes of outages include battery or other equipment failure from or capacity exceeded at the Universal Power Supply (UPS); human error; cyberattack; IT equipment failure; flooding or other water-related problem; heat-related computer room air-conditioning failure; or a power distribution unit or circuit breaker failure.

83 percent of the 450-plus U.S. data centers surveyed said they knew the root cause of the unplanned outage, and 52 percent believed all or most of the unplanned outages could have been prevented.