Library of Congress Moves to Scan and Store Digital Treasures
The library first received the Parks materials in late 2014 and formally opened them to researchers in the library’s reading rooms in February of 2015. The collection was first digitized and put online in February of 2016, and is a gift made to the library from the Howard G. Buffett Foundation.
The collection contains around 7,500 items in the library’s manuscript division, as well as 2,500 photographs in the prints and photographs division. Parks’s notes and correspondence describe the events surrounding her arrest in 1955 for disorderly conduct after she refused to give her bus seat to a white passenger, as well as the subsequent Montgomery Bus Boycott, a key event in the civil rights movement.
Once the library scans documents, it then needs to store them. Thomas Rieger, manager of digitization services at the library, says “the long-term storage requirements are absolutely astonishing. They’re enormous. I would hesitate to put a number out there, because it’d be wrong by the time I say it. It’s really, really massive.”
In October, Librarian of Congress Carla Hayden announced a five-year plan for a “digital transformation” of the library. The plan calls for the library to “continue our aggressive digitization program,” which prioritizes the library’s unique treasures, and to “improve search and access services that facilitate discovery of materials in both physical and digital formats.”
The plan notes that “digital media are subject to degradation just like physical materials, and preserving the utility of older computer files requires trained technical expertise.” The library will work to ensure that digital items in its collection have “a verifiable chain of custody to ensure authenticity as objects are moved between storage media, updated, or migrated between formats.”
Additionally, the library says it will “continue to investigate and practice methods of emulation and migration to provide continued usability of files and programs as technology evolves.”
In August, Accenture was awarded the $27.3 million contract to build the long-planned new data center for the Library of Congress. The three-year contracts calls for the construction of both a physical data center and other hosting environments, including cloud services, according to Roll Call.
Rieger notes that the cost of storage will continue to drop, which makes decisions about storing documents in high-quality formats that have large file sizes easier to navigate.
“Recognizing how much it’s going to cost to store it is tempered somewhat by the recognition that, over time, that cost will go down,” he says. “So, we find that balance is really the answer there. There are ways to compress files that do no harm. There are other ways that do a tiny bit of harm, but nothing significant. We make that judgment on each project.”
In terms of the amount of storage that’s dedicated to digitizing collections, Rieger says, “We’re not talking megabytes or gigabytes, we’re talking about petabytes.”
Today, the Library of Congress stores 50 petabytes of data in the cloud and on-premises, ensuring that history can be preserved and presented to future generations of Americans. That includes documents like those in the Parks collection.
“It’s really gratifying to work with a collection like the Rosa Parks collection because there’s so much interest in it and so much demand,” Michel says. “And for a lot of people who aren’t aware of the library and the services we provide and the richness of our collections, it’s a great opportunity to show and share with the world and help everyone access and see them, and appreciate them for what they are.”