Cloud

Feds Can Optimize Disaster Recovery Solutions in the Cloud

DRaaS provides flexibility, efficiency and reliability when emergencies strike.

Mike Chapple is associate teaching professor of IT, analytics and operations at the University of Notre Dame.

Every IT leader shares this nightmare: critical systems down, users enraged, data lost. Without reliable and timely access to data, political leaders and agency staff cannot carry on their work, and constituents cannot rely on their government. Disaster-recovery programs ensure those groups that their data will be protected from loss and available for use, no matter the emergency.

At the same time, agency IT teams remain under pressure to deliver a higher level of service at lower cost — and recovery from a disaster, natural or manmade, is expensive.

Many agencies seeking to meet these objectives turn to the cloud to provide disaster recovery as a service, or DRaaS. Cloud services’ pay-as-you-go pricing model is ideal for rarely used disaster-recovery environments. Rather than running backup data centers full of equipment that sits idle most of the time, agencies can pay to store data in the cloud and avoid overhead computing costs until the site is activated.

As agencies modernize their computing infrastructure, they should consider revising recovery strategies to fully leverage the benefits of the cloud.

SIGN UP: Get more news from the FedTech newsletter in your inbox every two weeks!

Determine Which Services Must Be Recovered First

Not all services necessarily deserve the same amount of attention during an outage. Restoring applications that support emergency-incident management or Social Security payment processing, for example, is far more urgent than a fix for those supporting office supply procurement or FOIA request processing. As a result, disaster recovery and business continuity require different strategies.

Agencies can use two key metrics to help differentiate those goals.

Work closely with departments to determine acceptable recovery-time objectives (RTOs) and recovery-point objectives (RPOs) for each IT service. That process generally involves a lot of discussion. Most teams believe all of their services are critical and create RTO and RPO values that incur prohibitive costs or that the agency may not have the capacity to meet. Determining final RTO and RPO values that are achievable as well as cost-effective typically requires stakeholder education and tactful negotiation.

Functional and IT teams should use RTO and RPO values to guide planning and recovery efforts. For example, in the event of a major failure, prioritize the recovery of services with the shortest RTO in order to meet the most needs first.

Prioritize Recovery of Communications Systems

In any emergency situation, communications are critical. When developing disaster-recovery strategies, put a special emphasis on communication systems, including email, text messaging, landline and VoIP telephones, and internet connectivity. Those services naturally cut across all functional areas within the agency and, therefore, may not rise to the top during RTO and RPO conversations — they’re so basic that most people just assume they’ll work, but they should not be neglected.

Acceptable RTOs for communications systems are generally quite short, so consider contingency plans that provide alternative communication channels as soon as possible. It might be better to provide users with access to a temporary email account in 30 minutes, for instance, than make them wait six hours for the recovery process to complete.

Automate Disaster Recovery

Automation is one way that IT teams can lower costs and reduce failures. The infrastructure-as-code approach to computing reduces the amount of time that IT professionals must spend installing and customizing systems, and that frees them up to focus on value-added activities. On a day-to-day basis, emerging technologies, including software-defined networking and scripts that automate server builds, provide ways to capture the configuration status of production systems and automate future builds.

The same automation technology that allows teams to build servers and networks in a consistent, repeatable fashion offers tremendous benefits in disaster recovery. The scripts used to build infrastructure in a primary data center environment under normal circumstances can often be repurposed quickly for disaster-recovery resources in the cloud as needed. Organizations that already run their primary services in the cloud may turn to automation technology during an emergency and rapidly shift services between regions with the same cloud provider or even between providers.

Agencies that adopt automated disaster-recovery approaches should regularly test those strategies. There’s nothing worse than waiting until disaster strikes to find that automated recovery scripts don’t work as expected.

The high-quality service and on-demand availability offered by cloud providers can furnish agencies with a promising and cost-effective solution to disaster recovery. Cloud providers also offer high-durability storage services that ensure that critical agency information remains in safe hands in the event of a technology disaster.

erhui1979/Getty Images