People who work at Los Alamos Natıonal Laboratory know a thing or two about energy — and moving at the speed of light. As part of the Energy Department, the nation’s leading security research lab found itself under pressure to trim its carbon footprint, but it also was rapidly running out of cooling and power in its data centers. So it turned to virtualization software to curb its kilowatt appetite.
By now, the cost and energy benefits of consolidating data centers are well known, and LANL’s experience was no exception. Using VMware Infrastructure, Hewlett-Packard ProLiant DL585 servers and HP StorageWorks Enterprise Virtual Array 8000/8100 storage area networks, the lab succeeded in decommissioning roughly 100 of its 400 servers and closing three data centers — reducing its energy needs by 873,000 kilowatt-hours per year and saving an estimated $1.4 million annually. The lab was also able to commission 100 new virtual servers without increasing capacity, says Anil Karmel, a solutions architect for the lab.
Even better, LANL started the project in 2006 and demonstrated a return on investment in only nine months — more than a year ahead of schedule.
But for Los Alamos and other agencies, virtualization offers an additional benefit: painless data backup and disaster recovery.
“Disaster recovery is really baked into the design of our infrastructure,” Karmel says. Virtualization “gives us real-time, built-in redundancy.”
Gone are the days when a system failure meant loss of data and productivity, as IT staff struggled to restore the most recent backups of physical machines from tapes or other storage media. Every bit that flows across LANL’s production virtual machine farm is instantly replicated to an alternate data center using HP StorageWorks Continuous Access EVA Software, an array-based replication technology. VM images of assets that need to remain physical are created using PlateSpin’s PowerConvert software for disaster recovery purposes. The lab backs up the entire virtual infrastructure to disk and subsequently to tape using Vizioncore esxRanger with VMware Consolidated Backup software, providing multiple levels of fault tolerance and disaster recovery. If one of LANL’s servers goes down, a duplicate stands ready to take its place almost instantly, at minimal cost.
“In the event of a primary data center failure, we can bring both our virtual and physical server images online in our alternate data center. This allows us to bring our entire infrastructure up quickly without having to deploy another physical asset,” says Karmel. “In the unlikely event that we lose multiple data centers, we can always restore the machine images to a fresh VMware farm.”
Virtualization makes disaster recovery easier and cheaper because virtual servers can be quickly reconfigured and switched to handle different applications, says Teresa Bozzelli, chief operating officer and managing director of research firm IDC Government Insights.
“If you lose a box that happens to be running the back-office suite that handles payroll, you can very quickly move that application to a different box that’s live on the network,” she says. “Just because a server goes offline doesn’t mean you have to lose the application it’s running. That ensures continuity and makes your disaster recovery plan very cost effective.”
Disaster recovery and vastly improved uptime are prime drivers for agencies such as the Office of the Comptroller of Currency. The agency moved to a virtual environment in part because of the challenges of taking down and bringing back a traditional client-server environment, according to CIO Bajinder Paul.
The typical maintenance and backup routine at OCC requires that in a single weekend it backs up 12 terabytes of data and brings down and restarts 240 servers. There can be no downtime, so the virtual environment improves this process and eases upgrades for the Treasury Department agency.
At the Environmental Protection Agency, moving to virtualization reduces energy, heating and air-conditioning loads, and space requirements in federal computer rooms, says Ken Kerner, IT infrastructure manager for the agency’s Region 10 in the Pacific Northwest. It has also cut back server administration and eliminated some weekend work. Since March, Kerner has been leading an effort to consolidate more than 50 servers in the region’s Seattle headquarters down to fewer than six, which he says will produce energy savings of 25 percent to 30 percent, if not more.
“One of the classic beliefs in IT is that you have to schedule process changes after hours or on weekends,” he says. “As we’ve moved deeper and deeper into virtualization, we’re getting to the point where we trust the system enough to allow us to move all our users off one server and onto another, make the necessary changes and then move them back, during the day while they’re still working.”
Kerner points out that while cost savings please agency IT organizations, it’s added data security that matters most. “The most valuable thing we have is our data,” he says. “It is IT’s primary responsibility to safeguard and keep that data available. As long as we have redundancy in our data storage, it’s much easier to gin up a new server and bring it online.”
Leaping Beyond Limitations
Of course, virtualization also poses a few challenges. Determining which applications can effectively inhabit the same physical machine and which require dedicated CPUs takes careful planning. Many agencies have the additional burden of needing to store sensitive data, such as medical records, on separate machines for regulatory compliance, which can make consolidating data centers tricky.
“The extra security required by federal regulations adds layers of complexity and control, which can reduce the optimal virtualized scenario,” says Government Insight’s Bozzelli.
Another pitfall agencies encounter comes from focusing too much on the technology and not enough on how it can help them accomplish their broader mission, says Dennis Lasley, vice president of integration for consulting firm Accent Global System Architects, which is working on a virtualization project within a division of the Internal Revenue Service.
“One of the problems with virtualization is that technology often ends up driving the discussion,” Lasley says. “Technological decisions get made before the business processes are fully understood, and as a result, organizations aren’t leveraging the technology as efficiently as they could.”
By focusing solely on the low-hanging fruit of cost and energy savings, Lasley says, agencies might miss out on other potential advantages that virtualization can bring, such as the ability to deploy remote desktops and enable a more effective mobile workforce.
Agencies contend they’ve gotten more than their money’s worth from their virtualization efforts. Kerner says that in addition to cutting costs and reducing carbon emissions, a key benefit for EPA is the flexibility that VMware’s management console provides in controlling network resources.
“We’ve seen a huge gain in tracking how CPUs, memory and storage are being utilized,” he says. “It helps us plan for the needs of the network. If we see an Oracle server starving for memory, we can quickly start another instance of that server, take down the first one and add more memory or CPUs, then bring it back up. It’s helped us to be a lot more efficient.”
OCC’s Paul notes that virtualization allows his agency to communicate more effectively with its more than 1,700 national banks, as well as the Federal Deposit Insurance Corp. and the Federal Reserve Board. It also lets OCC roll out applications to community banks — such as software that helps them evaluate financial risk — in half the time it once took, in part by reducing the number of test environments the apps had to run under.
The list of virtualization benefits for OCC are many, according to Paul: It improves responsiveness, allows the agency to quickly respond to unplanned server requests, reduces acquisition turnaround, increases capacity usage and provides fast implementation.
Sound too good to be true? Many people think so, says LANL’s Karmel. “A lot of people think there must be a catch, but we’ve yet to see one,” he says. “We gained things we never had before — like the ability to provision a server in 30 minutes instead of 30 days. We haven’t had any ‘gotchas.’ ”