Virtualization to the Rescue
When the 9.0-magnitude earthquake hit Japan this winter, it caused widespread damage and major IT disruptions, including to systems used by the U.S. military. The quake left the power grid so affected in the country that the Defense Contract Management Agency decided to move systems to its California data center.
"Those systems are running as virtual systems in our Japan data center," says Peter Amstutz, acting director for architecture and infrastructure at DCMA. "Moving them was a matter of powering them off in a virtual sense, copying the files over here and then turning them on."
The change did not cause even a hiccup for most DCMA users, says Jacob Haynes, acting CIO. The agency simply scheduled the trans-Pacific move for a few hours over a weekend. "We can do that over the weekend because of the infrastructure we have in place."
Virtualization fundamentally changes DR planning because of the flexibility it adds by decoupling the operating system and applications from the hardware. "The failover and survivability that was dependent on clustering in the past is now handled on its own," Haynes says.
Plus, server virtualization also adds speed and reduces complexity in continuity of operations planning.
Before virtualization, a COOP migration similar to the one that took only a few hours following the earthquake would have taken days or weeks. It's a complex system, points out Amstutz. To physically bring down servers and rehost the systems thousands of miles away would have required many more people to be involved. "Now, it's handled by the data center folks," he says.
It was for just such reasons that DCMA, which manages service and supply contracts for the Defense Department, undertook a major consolidation and virtualization effort over the past few years. The migration to a virtual environment strengthened the agency's disaster recovery capabilities, and ultimately, DCMA consolidated 18 regional data centers down to two mirrored sites, plus a testing location.
DCMA is not alone in grasping the appeal of this approach for COOP. The Census Bureau and the Bureau of Land Management (BLM) have taken a similar technological path toward DR. But officials at all three agencies agree that virtualizing systems and being prepared for a disaster demands that their organizations maintain a technological balancing act. To strike the right balance, they offer five tips.
1. Know your infrastructure.
"We take for granted that we understand the usage of systems," says Haynes. "But I would say most enterprises don't really know." He advises getting a detailed handle on how servers are used, including frequency of usage. Then, scale horizontally rather than vertically at first.
Brian McGrath, CIO of the Census Bureau, adds that it's critical to define clear goals and objectives for any consolidation and virtualization effort. Once technological and business objectives are fleshed out, the IT team can define roles and establish performance measures, McGrath says.
Census began its virtualization program about a year and a half ago, and is "working aggressively" in accordance with the Federal Data Center Consolidation Initiative.
2. Incorporate supporting technologies.
The Census Bureau chose a combination of HP and VMware tools to build, provision and monitor its virtualized environment.
"We're clearly monitoring at a greater level of granularity now that we have a virtual environment," McGrath says.
Configuration management tools can help provision assets, plus monitor capacity and performance. They make it easier for Census to track and move virtual servers as needed. And monitoring is a key piece of disaster planning because "then you're aware of any performance issues or instability," McGrath says. "You can take appropriate actions to ensure that your systems remain stable."
Data replication technology for migration is part of what allowed DCMA to move its Japanese data center information quickly. "A lot of technologies in the virtualization space do capacity planning and actual migrations," Haynes says. "The key is that the framework itself has to accommodate the desired end-state of the virtualization initiative."
3. Plan for appropriate storage and backup.
Virtualization means that new systems can be created quickly, but don't forget that you'll have to back up that new data, notes McGrath.
"Because it becomes easy to spin up systems," he says, "we want to make sure we don't exceed either provisioned systems that aren't being backed up or capacity for our backup capability."
Census has been able to keep its existing backup and restore tools, but McGrath has made sure to involve the backup team when provisioning virtual systems. "They really all need to work hand in hand," he says, "so your server, storage and network provisioning, and your backup and restore, need to be in sync."
4. Consider the cloud.
As it moves forward with consolidation, the Census Bureau expects to weave in cloud services, McGrath says. The IT team is considering a hybrid approach to ensure continuous operations, using the public cloud for disaster recovery and failover, he says.
That's also a primary reason why the Bureau of Land Management is looking to the cloud as it consolidates data from hundreds of remote offices around the country.
"The core mission for us is managing land, not managing IT," says Patrick Stingley, BLM's chief technology officer.
That's a chief reason why the Interior Department agency, like Census, is looking to the cloud as it consolidates data from hundreds of remote offices around the country. BLM has virtualized many systems over the past two years, but its IT group expects that cloud services will be essential to future COOP efforts, Stingley says.
"The virtualization that's most important to us is not running VMs," he says. "It's other kinds of virtualization — things like remote visualization of geographic information systems software and cloud storage — that offer the most benefit
for us."
Cloud-based services ensure continuity of operations, Stingley says. "We have GIS data hosted in remote field offices because of insufficient bandwidth to deliver it over our WAN."
There's some risk that these BLM offices could experience a failure, he says. "Putting the data into a cloud and delivering it via virtual private network may allow us to maintain the data centrally and deliver it effectively to remote offices."
5. Prepare for cultural changes.
Virtualizing servers at DCMA led to organizational changes for many of the agency's employees.
"When you walk into a data center, the entire room is buzzing and humming, and people are used to touching machines," Haynes says.
59%
Organizations applying server virtualization for disaster recovery or continuity of operations
SOURCE: CDW Server Virtualization Life Cycle Report, 2010
Moving from a regional to centralized data center setup meant DCMA had to start thinking like a large enterprise instead of a group of smaller organizations that worked together. As a result, some DR roles changed.
"We've put failover and survivability into the hands of the data center operators, rather than in the hands of the application experts," Amstutz says.
To avoid ill will, Amstutz suggests emphasizing communication and making it easy for managers and users within the agency to understand the technology and the changes it will bring. At DCMA, he adds, the agency's central alert and notification system was a huge help with pushing out information. But that was just a mechanism for communicating information; it's more critical to have a communication strategy in place, Amstutz says.
"We took a particular block of systems, then developed a communication plan around those blocks as we were making changes."