While the IC’s research organization looks into adding security to cloud environments, in the here and now, intelligence agencies are sharing more data.
Deep in a bunker in a Virginia hillside, robots and computers are running 24x7 to preserve, digitize and archive the collective memory of American culture. From the early films of Thomas Edison to TV sitcoms, Super Bowl ads, folk music recordings and modern-day oral histories, the Library of Congress Packard Campus of the National Audio-Visual Conservation Center (NAVCC) shelters more than 6 million audio and visual artifacts. A unique end-to-end technology architecture lets the library maintain these treasures for posterity at a rate of multiple petabytes per year.
The center’s digital laboratories and resources will enable the library to preserve more than 50 percent of these deteriorating collections by 2015 (compared with just 5 percent in the same time period using traditional analog methods). It is an “audio-visual legacy that might otherwise be lost to the ravages of time or indifference,” according to Librarian of Congress James H. Billington.
Located in Culpeper, Va., about an hour and a half southwest of Washington, D.C., the Packard Campus gives new life to a mammoth bunker originally designed to store cash and protect government officials in the event of a nuclear attack. The government decommissioned the facility in 1993, and in 1997 Congress passed a law to let David Packard and the Packard Humanities Institute (PHI) buy the property and develop it as a donation to the federal government.
The Packard Campus was made to order, with a robust internal network to manage unusually large loads of data at rates never dealt with before at the Library of Congress, says Thomas Youkel, systems programmer for Information Technology Services (ITS) in the library’s Office of Strategic Initiatives. “We’re talking about ingesting and processing terabytes per day,” Youkel says. Multiple 10-Gigabit Ethernet networks span the building to handle the data flow.
“One of the clichés that is often used about NAVCC that’s actually useful is that it’s a file-making facility,” says Mike Handy, chief of the Automation Planning and Liaison Office in Library Services. Multiple digital conversion machines are at work day and night converting video, film and audio to digital files.
For example, Lawrence Berkeley National Laboratory scientists created IRENE (Image, Reconstruct, Erase, Noise, Etc.) — what Handy terms an “optical vinyl disk reader.” Without ever touching a stylus to a fragile record, IRENE uses digital imaging technologies to generate high-resolution digital maps of the grooved surface of recordings. With this front-end tool, preservationists can rebuild damaged or broken recordings and capture clean sounds even from deteriorated recordings.
Once files are digitized, they are stowed away in “the largest archive that we’ve ever envisioned,” Handy says. The files reside in an SL8500 tape library that uses a custom-built archive interface created by library Systems Analyst Sarah Gaymon and her team. Using Sun StorageTekStorage Archive Manager and File System software (SAM-FS) that administers hierarchical storage management (HSM), library programmers wrote an interface utility that addresses the demands of audio and visual material coming in and lets the applications talk to the archive.
The sheer volume of digital information created required the development of a sophisticated archival system for long-term storage. The library’s solution was possible in part because it could use the backup and recovery capabilities of the Legislative Branch Alternate Computing Facility, a remote and secure disaster recovery facility, to house the archived files.
In March 2004, the library brought together staff from its Office of Strategic Initiatives with a few high-performance computing and telecommunications people, as well as some representatives of various storage manufacturers and IT people from the moving-image side (read Hollywood types) to talk for two days about just what it was they were proposing to build, from a technology architecture perspective. How much would it have to deal with in terms of megabits per second coming down the line to the archive?
The result was a spreadsheet about 20 lines deep and six columns across, Handy says. “The high-performance computing guys just sat there and cranked it out and cranked it out,” he says. “And the magic number at the end was something like 4,800 megabits per second. And everybody stepped back and said, ‘Whoa! That’s a lot of data. How are we going to do this?’ ”
Ultimately, the library deployed a multitiered system that consists of Sun Fire x64 servers and a Sun storage area network running the Solaris 10 operating system.
Even with an ideal locale and adequate funding, the center faced technological hurdles from the outset.
In the 1990s, when the library management team started talking about what it might want to do with the new facility, Handy says it was inconceivable that the library would be able to connect telecommunications to Culpeper and link the facility back to Washington — the infrastructure didn’t exist.
But the introduction of masses of fiber-optic cable in the region just a few years later literally opened the path to make it possible. It also helped that the library’s disaster recovery facility, located halfway between the two sites, was available to serve as a hub for the new high-speed fiber-optic link.
These connections give researchers the ability to see or hear copies of the digital files on Capitol Hill in the Library of Congress’ Motion Picture and Recorded Sound reading rooms.
Another early hurdle was storage. Initially, there simply was not a viable storage solution for the volume of data the systems team was anticipating, Handy says. Fortunately, the technologies caught up by the time the library was ready to begin creating solicitations.
Congress has contributed $82.1 million since 2001 for operations, maintenance, equipment and related costs. But the $150 million gift from PHI — one of the largest-ever private gifts to the U.S. government — is what made the Packard Campus possible. This makes measuring return on investment a little different than for congressionally appropriated programs.
“In terms of the investment, I think it’s important to remember that what NAVCC does is followed closely by audio/visual libraries and archives around the world,” Handy says. “So some of the equipment they’re developing in this facility other institutions are taking up to use with their own collections. So it’s sort of a best practices facility. It has become the benchmark against which you would judge your own archive preservation facility.”