“You come back in from the mine, you’re tired, you’re dirty, you’ve had a long, physical day in some cases. Back in the office, you log onto your computer to open a document over the network. As simple a concept as that was, the delays due to networking made it pretty much unusable.”
What’s more, says Chris Weaver, a supervisory coal mine inspector based at the Mine Safety and Health Administration’s Bridgeport, W.Va., field office, the screen would display the hourglass symbol, continually refreshing, without loading the content of the page.
The IT team has turned the process on its head, Weaver says: “They took a system that was unusable and made it to where you could hardly tell you were on a network. It made it feel like you were working off your own hard drive.”
It used to take days at a mine doing onsite inspections and then hours back in the office filling out an inspection checklist and other paperwork. But the first time the agency tested its new Cisco Systems Wide Area Applications Services (WAAS), the time it took to open a 12.5-megabyte inspection checklist over the network dropped to 55 seconds from 1 minute and 50 seconds, says George Fesak, director of MSHA’s Program Evaluation and Information Resources (PEIR). After caching the document, each ensuing access took just 2.2 seconds.
To enhance and speed its communications and the exchange of information with people out in the field — to make sure anyone who needs to can pull up inspection reports from the server to access current mine information, accident and injury information, or the latest version of a mine map during an emergency or an investigation — last fall MSHA began installing an EMC storage area network and Cisco WAAS.
By consolidating its information infrastructure from many far-flung, aging servers to centralized data centers, the agency expects to save money on IT equipment, maintenance and personnel, Fesak says. Now, MSHA maintains two primary production data centers, in Lakewood, Colo., and Arlington, Va., and a third, smaller center in Beckley, W.Va., where the agency runs a training academy. The two production sites can store about 15 terabytes of data each (and are expandable from 280TB to 960TB depending on the size of the hard drive) and support automated replication on their EMC Symmetrix DMX-3 SAN platforms. If the Lakewood or Arlington site is unavailable for any reason, MSHA employees can log on to the alternative sites, regardless of where they are working.
More than a year ago, MSHA’s technology staff had begun looking at ways to improve the systems that support the agency’s disparate users, but mining accidents last summer intensified those efforts. The collapse of the Crandall Canyon Mine trapped and killed six coal miners thousands of feet underground in Huntington, Utah, on Aug. 6, 2007. Ten days later, another collapse killed three rescue workers and injured six others.
MSHA maintains detailed inspection records and maps for every U.S. mine, and inspectors, as part of their work, constantly update these records so that the most current information is available should an accident occur.
To gather this information, inspectors check thousands of square miles of mines. Much of this data was collected as handwritten notes, resulting in half-inch-thick inspection reports of mine conditions and any violations found. Because these documents can be susceptible to error, MSHA several years ago developed an inspection-tracking system with checklists that it stored online. The IT team also developed the MSHA Standardized Information System (MSIS), which runs Sun Solaris and stores the inspection filings in an Oracle database.
But over time, as it added new functions to its system, MSHA also piled up servers — and expenses associated with running and maintaining them. Security concerns loomed large, too, because patches were difficult to implement, and managing updates to the servers was therefore inadequate, notes Fesak. Most important, opening and filling out a mine-inspection checklist, or any other document using basic Microsoft software, grew extremely slow.
After the Crandall Canyon disaster, MSHA was inundated with requests for information and two congressional subpoenas to provide information, including e-mail, Fesak says. MSHA wanted to be responsive to Congress, he says, “but it was very painful and took a lot of resources.”
The archiving capabilities of the old system were limited, and there was no way MSHA could do e-discovery or a global search, points out Syed Hafeez, deputy director of PEIR. A company in New York told MSHA it would cost $1.2 million to extract all the e-mail from 10 locations and then search the extracted data with a set number of keywords. MSHA ended up using its limited resources to complete this in-house.
Further, there was no centralized storage system, MSHA could not sync its three data centers, and backup was a “nightmare,” Hafeez says. “Either we didn’t have it, or it was not available when we needed it.”
When management needed to implement better tracking of inspections, the IT team basically took its lead from the technology, Fesak says. “We investigated what was actually happening in the field and put the facts together with our findings and sold it to management.”
He says he wishes it had been in place before MSHA needed it. “What it cost us to find those e-mails and get them to Congress would have paid for the archiving system,” Fesak says.
Striking Pay Dirt
One of the reasons MSHA went with the EMC and Cisco package was the initial response users had when IT ran a pilot to test the products and services at two sites, says Dan Custer, the agency’s systems administrator for MSIS.
The new infrastructure incorporated MSHA’s existing assets, such as tape backup. The agency’s local data center servers have access to the centralized data through both SAN and network-attached storage. MSHA is in the midst of implementing the first stage of its SAN with some Oracle test data and moving data from one Exchange Server, Custer says.
“Performance is quite a bit better,” he says. “I do know that backups already are going four times faster, and that’s purely by virtue of improved disk performance. That’s without changing our backup methodology, which we do plan to do later, and it will probably be even faster than that.”
WAAS lets MSHA extend its highly secure and available infrastructure and gives users in each remote office the feeling that they are tapping local servers and storage.
MSHA was able to install the devices within weeks, and they dovetailed into the existing routing infrastructure. The agency had Cisco and EMC run a train-the-trainers session, and the EMC team then trained MSHA staff.
Now, the agency is incorporating a tiered-storage strategy to economically manage its information with EMC Centera, a content-addressable storage platform. The equipment is all in place, and the services team is doing data and application migration. Older information is being moved to a background archive environment that is available on disk and driven by policy.
Mining for ROI
Getting to this point wasn’t a cakewalk. An initial rough spot occurred when working with MSHA’s third-party network services provider, Sprint, during the WAAS installation. That added another layer of complexity, which is common in the federal environment because of the FTS2001 and the Networx telecom contracts.
Another issue was power at the Lakewood, Colo., site. “We were pretty close to the edge to start with,” Custer says. “When the system arrived, we found out we needed four 50-amp, three-phase circuits — a total of 200 amps. We weren’t ready for that.”
Hafeez adds, “You have to worry about your UPS, too. The UPS we have doesn’t have that much juice.”
For the short-term, MSHA is using building power at that site but plans to install another UPS with a generator. By contrast, MSHA’s Beckley, W.Va., site has a brand new data center with plenty of capacity and power. “Everything is there for the future,” Hafeez says.
Another limitation was bandwidth because the agency had been operating in a highly distributed fashion. But WAAS helped MSHA overcome that, letting the agency build the new infrastructure with limited bandwidth and still improve speed and productivity.
“Bandwidth is not cheap — plus management,” Hafeez says. “So this little investment improves the performance of our network and also saves the time of our inspectors who now have more time to do their main job.”
Fesak says MSHA will see a fiscal return on investment as it moves through the next three years with reduced cost for servers and power consumption because the new storage system will service virtual servers and blade servers. Backup expenses for tape and offsite storage should be limited.
It’s hard to quantify how much the agency might save on e-discovery costs, Custer adds. But if the past year is any indicator, Fesak notes, it will be a lot.
There’s also a crucial but intangible ROI factor to consider, says Weaver: identifying mine hazards and getting them corrected before people get hurt — the ultimate return on investment for an agency whose mission is to reduce the frequency and severity of mine accidents.
“Any savings on our time on the computer means more time we can spend in the field,” Weaver says.