Digital Workspace

Library of Congress Uses Scanning and Storage Tech to Bring Rosa Parks’s Story to Life

The library has digitized thousands of documents and photos related to the civil rights icon.

Twitter

Phil Goldstein is a former web editor of the CDW family of tech magazines and a veteran technology journalist. He lives in Washington, D.C., with his wife and their animals: a dog named Brenna, and two cats, Grady and Princess.

The Rosa Parks collection is just one example of how the Library of Congress uses scanning and digital storage technology to preserve American history, which the library sees as one of its core missions.

As the library notes in a video, it uses a variety of scanning technologies to capture a wide range of physical materials in this collection and others, from flatbed scanners to digital cameras mounted on copy stands. Phil Michel, the digital project coordinator in the prints and photographs division at Library of Congress, says that “digitization is really important for collections because it really gives us the opportunity to share these collections with the world.”

The Library of Congress houses more than 168 million items, but some are more tightly linked to the hinge points of American history than others. Such is the case with the library’s Rosa Parks collection, which “documents many aspects of Parks's private life and public activism on behalf of civil rights for African Americans,” as the LOC notes.

Kate Zwaard, director of digital strategy at the Library of Congress, noted that with the recent digitization and public release of Parks’s papers, citizens could, for the first time, “read accounts of that day that she experienced in her own hand.”

“And being able to see that really brings to life the experiences of American heroes,” Zwaard notes.

Library of Congress Moves to Scan and Store Digital Treasures

The library first received the Parks materials in late 2014 and formally opened them to researchers in the library’s reading rooms in February of 2015. The collection was first digitized and put online in February of 2016, and is a gift made to the library from the Howard G. Buffett Foundation.

The collection contains around 7,500 items in the library’s manuscript division, as well as 2,500 photographs in the prints and photographs division. Parks’s notes and correspondence describe the events surrounding her arrest in 1955 for disorderly conduct after she refused to give her bus seat to a white passenger, as well as the subsequent Montgomery Bus Boycott, a key event in the civil rights movement.

Once the library scans documents, it then needs to store them. Thomas Rieger, manager of digitization services at the library, says “the long-term storage requirements are absolutely astonishing. They’re enormous. I would hesitate to put a number out there, because it’d be wrong by the time I say it. It’s really, really massive.”

In October, Librarian of Congress Carla Hayden announced a five-year plan for a “digital transformation” of the library. The plan calls for the library to “continue our aggressive digitization program,” which prioritizes the library’s unique treasures, and to “improve search and access services that facilitate discovery of materials in both physical and digital formats.”

The plan notes that “digital media are subject to degradation just like physical materials, and preserving the utility of older computer files requires trained technical expertise.” The library will work to ensure that digital items in its collection have “a verifiable chain of custody to ensure authenticity as objects are moved between storage media, updated, or migrated between formats.”

Additionally, the library says it will “continue to investigate and practice methods of emulation and migration to provide continued usability of files and programs as technology evolves.”

In August, Accenture was awarded the $27.3 million contract to build the long-planned new data center for the Library of Congress. The three-year contracts calls for the construction of both a physical data center and other hosting environments, including cloud services, according to Roll Call.

Rieger notes that the cost of storage will continue to drop, which makes decisions about storing documents in high-quality formats that have large file sizes easier to navigate.

“Recognizing how much it’s going to cost to store it is tempered somewhat by the recognition that, over time, that cost will go down,” he says. “So, we find that balance is really the answer there. There are ways to compress files that do no harm. There are other ways that do a tiny bit of harm, but nothing significant. We make that judgment on each project.”

In terms of the amount of storage that’s dedicated to digitizing collections, Rieger says, “We’re not talking megabytes or gigabytes, we’re talking about petabytes.”

Today, the Library of Congress stores 50 petabytes of data in the cloud and on-premises, ensuring that history can be preserved and presented to future generations of Americans. That includes documents like those in the Parks collection.

“It’s really gratifying to work with a collection like the Rosa Parks collection because there’s so much interest in it and so much demand,” Michel says. “And for a lot of people who aren’t aware of the library and the services we provide and the richness of our collections, it’s a great opportunity to show and share with the world and help everyone access and see them, and appreciate them for what they are.”

Another Believer/Wikimedia Commons