“We got personnel working again within 96 hours of the flood starting,” says Maj. Mike Scott, 55th Communications Squadron, Offutt Air Force Base, Nebraska.

How Disaster Recovery Plans Keep Agencies Running

March’s “bomb cyclone” threatened operations at Offutt Air Force Base and the National Weather Service in Omaha, but advance planning prevented disaster.

Federal workers know that helping citizens recover from natural disasters can be part of the job. But sometimes those employees find their agencies upended by the same emergencies that threaten and disrupt their neighbors.

Disasters both natural and unnatural (hurricanes, earthquakes, floods, tornadoes, fires, terrorist attacks and more) that sweep through civilian communities can and do leave their mark on federal property as well.

To ensure continuity of operations, agencies are building redundancy into their IT infrastructure, using data backup and recovery software and secondary data centers, and deploying extra network equipment.

A good disaster recovery plan starts with a threat analysis to identify potential risks at specific locations, then creates a runbook, which provides a set of procedures on how to respond to disasters and covers people, processes and technology, says IDC Research Director Phil Goodwin.

“It’s the ‘Who does what, when, why and how?’” Goodwin says. “It starts with declaring a disaster and what steps are taken. At the beginning of the threat, these people are notified and are responsible for taking certain actions, such as failover contingencies.” And, as shown by a recent billion-dollar flood in Nebraska that the National Weather Service termed “historic,” stocking up on spare technology and having a plan for backup offices are also smart strategies.

IT%20Infrastructure_IR_1%20(2)_0.jpg

Air Force Base Springs into Action to Save IT from Flood 

The disaster began in mid-March with a “bomb cyclone” — a weather phenomenon in which the atmospheric pressure inside a storm drops precipitously in a very short time, comparable to what happens inside a massive hurricane. The storm hit the center of the country with everything from dangerous blizzards to rapid flooding; three people died.

On Friday, March 15, the crisis action team at Offutt Air Force Base, about 12 miles south of Omaha, Neb., eyed the swelling Missouri River and informed officials that they had less than 48 hours before the rising water would reach the base.

Lt. Col. Joseph Videc, the 55th Communications Squadron commander, immediately met with his staff to discuss plans, and they began evacuating IT equipment from endangered buildings just two hours later. Separate teams began moving munitions and aircraft to higher land.

Airmen fortified the southeast side of the base with 235,000 sandbags and 460 flood barriers, while about 30 IT staffers rushed from building to building to unplug Cisco switches, network encryption devices and uninterruptible power supplies, loading them onto pickup trucks and moving them to facilities on higher ground.

$4.7 million

The value of IT and satellite equipment saved during preflood evacuations at Offutt Air Force Base.

Source: Offutt Air Force Base

“We focused on the expensive equipment. We didn’t have the manpower to collect PCs, which are easily replaced,” says Maj. Mike Scott of the 55th Communications Squadron, who led the effort. “We told organizations, ‘If you are concerned about your PCs and have a place to move them, please do so. Otherwise, at least move it on top of your desk.’”

They initially tried to label the network cables so the staff would know where each of them went. “But as we started to run out of time, we were just pulling and grabbing,” he says.

The squadron moved IT equipment around the clock until buildings took on water that Saturday night. Some mission-critical facilities had to remain operational until they were forced out by the flood. “We couldn’t pull some devices until the very last minute,” Scott says.

MORE FROM FEDTECH: Find out how fog computing can help your agency.

Offutt Air Force Base Keeps Operations Running 

The flood water, caused by record snowfall followed by a rapid melt, overwhelmed existing levees and the makeshift barriers. The result: $420 million in damage at Offutt alone. 

Damage across the six most affected states totaled more than $1 billion, and the final amount has yet to be calculated, according to the National Weather Service.

One-third of the 4.3-square-mile base was flooded, including 137 structures and 44 occupied buildings with 1.2 million square feet of office space, including the 55th Wing’s headquarters. Fortunately, no major IT infrastructure or critical data was lost because the base’s data centers are located elsewhere on the base, says Scott. The squadron’s efforts did save $1.7 million worth of IT equipment, and another $3 million worth of satellite communications equipment.

In the immediate aftermath, base leadership prioritized getting mechanics back online first, so they could perform maintenance on jets and make them available for pilots.

The Communications Squadron’s IT team quickly turned the base’s community center into a makeshift office for 200 personnel by installing four spare Cisco switches and equipping staff with new laptops from storage.

The squadron also created temporary offices for the rest of the 3,200 displaced personnel in the base’s community center, hotel and conference center. They used spare technology housed in storage, including Cisco switches and Windows laptops.

They placed Cisco IP phones in new office spaces and partnered with cellular carriers to provide mobile hotspots in rooms not wired for network connectivity.

“It was a scramble to find real estate, but the community center can fit a lot of people,” Scott says. “We got personnel working again within 96 hours of the flood starting.”

In the ensuing weeks, other Air Force bases shipped spare laptops, docking stations, monitors, keyboards and mice. Offutt personnel used an imaging server to image about 20 computers every two to three hours.

So far, only one facility in the flooded area is back in operation. Offutt — one of at least five U.S. military bases severely ­damaged by weather or earthquake since September 2018 — is awaiting funding to repair and replace damaged structures filled with what is now ­considered contaminated waste.

Computers were disposed of and their hard drives destroyed, but no critical information was lost, Scott says. All mission data is backed up within the base’s data centers and again offsite.

“Mission data is stored in shared drives, SharePoint or our data centers, and all that data was unaffected by the flood,” he says. “We didn’t lose any critical IT assets.”

MORE FROM FEDTECH: Follow these best practices to keep agency data safe from loss or theft. 

National Weather Service Office Evacuates to Backup Sites 

At the same time, meteorologists in the Omaha National Weather Service office, located about 30 miles west of the city near the Platte River, found themselves in a parallel situation, watching the river close in even as they had to keep forecasting and sending alerts.

The NWS, which operates 122 forecast offices nationwide, regularly creates contingency plans for business continuity if individual offices have to evacuate or are damaged in a disaster, says COO John Murphy.

Each forecast office has primary and backup network circuits and is designed to withstand different strengths of storms. The Key West, Fla., office, for example, is built with reinforced concrete and bulletproof glass that can withstand a Category 5 hurricane; in 2017, meteorologists there were able to ride out Hurricane Irma, a Category 4.

Maj. Mike Scott, 55th Communications Squadron, Offutt Air Force Base, Nebraska
We got personnel working again within 96 hours of the flood starting.”

Maj. Mike Scott 55th Communications Squadron, Offutt Air Force Base, Nebraska

But sometimes, the NWS has to evacuate. With levees on the Platte failing and roads beginning to close, the Omaha staff made the decision to go. They powered down all the electrical systems and IT equipment, including the computers and servers that run the Advanced Weather Interactive Processing System for weather forecasting.

“They made sure the AWIPS equipment was safe. They shut down the electrical systems so if the water inundates the office, it doesn’t destroy the equipment,” Murphy says.

The NWS office in Hastings, Neb., about 140 miles southwest of the Omaha office’s location in Valley, Neb., took over weather operations as the Omaha employees fanned out to backup sites, including the Nebraska State Emergency Operations Center, the NWS Central Region headquarters in Kansas City, Mo., and the NWS Hastings office.

When the Omaha NWS meteorologists were all in place, they took back some of the decision support, Murphy says. Forecast offices use the same AWIPS technology, so working in the Hastings office was a seamless transition, says Meteorologist Cathy Zapotocny.

“Within a few hours, our Omaha team was able to provide support from the Hastings office,” she says. “The Hastings team were great hosts. They assisted us in a lot of ways.”

Zapotocny started the day at the state’s emergency operations center, so she didn’t have to move during the evacuation. She was onsite at the emergency center to provide emergency response teams the latest flood information, which she accessed through internal NWS web resources on her laptop.

Fortunately, the Omaha office suffered no water damage. 

“We got lucky,” Murphy says. “The water was right up to the door, but never got inside.” 

MORE FROM FEDTECH: Find out how feds can optimize disaster recovery solutions in the cloud.

FAA Embraces the Cloud for Data Backup and Disaster Recovery 

Some emergencies expose flaws in continuity plans. The Federal Aviation Administration took lessons learned from fires that damaged IT equipment to reinforce redundancy throughout its IT and network infrastructure.

In 2014, a contractor deliberately set fire to the Air Route Traffic Control Center in Chicago, which covers 91,000 square miles of high-altitude airspace in the Midwest and is the fifth busiest center in the country. 

The fire forced other centers to temporarily take over air traffic control and required IT staffers to rebuild network and IT operations within two weeks.

A year or two before, an accidental fire broke out at the FAA’s Atlantic City data center and took down critical systems, says FAA acting CIO Sean Torpey. “We found some blind spots and vulnerabilities,” he says. “So we redesigned infrastructure and rearchitected some of our continuity of operations.”

The IT staff has identified its high-value assets and makes sure there is a disaster recovery plan for each, Torpey says.

“We have continuity of operations plans for individual IT programs and applications,” he says, “as well as continuity of operations plans for Trusted Internet Connection, two points of ­presence for internet access, firewalls and VPNs.”

For example, if the main data center in Oklahoma City goes down, major applications and VPN services will automatically fail over to the backup data center in Atlantic City.

On the network front, the FAA subscribes to fully redundant ISP services via diverse providers and deploys dual core routers and redundant switches at each facility. The agency also uses storage and backup solutions to back up regional and field office file servers from the data centers.

“If a site gets hit by a weather system or man-made catastrophe, we will have backups of all the information,” Torpey says.

The FAA plans to rely more on the cloud for data backup and disaster recovery. The agency has migrated to Microsoft Office 365 and will begin using OneDrive for file storage.

The FAA has also begun migrating to Microsoft Azure for backup data center services and plans to deploy a “hot-hot” configuration, so if the agency’s in-house data centers go down, mission-critical applications will continue operating in the cloud, he says.

“We are all about redundancy and high availability of air traffic services. It’s built into our DNA,” Torpey says.

Photography by Colin Conces
Jul 29 2019

Sponsors