The Main Reading Room of the Library of Congress in the Thomas Jefferson Building.

Smaller Agencies Still Manage to Digitize Huge Volumes of Data

Document scanning technology helps agencies preserve historical artifacts.

In April 2018, the U.S. Government Publishing Office and the National Archives’ Office of the Federal Register announced that they had digitized every issue of the Federal Register, dating back to the first one published in 1936. 

The Federal Register is the official journal of the federal government and contains agency rules, proposed rules, and public notices. The collection totaled 14,587 individual issues, which is nearly two million pages. To make them searchable online, the Federal Register issues were scanned using optical character recognition technology. 

The GPO had reason to digitize the documents, since it is the is the government’s official, digital, secure resource for producing, procuring, cataloging, indexing, authenticating, disseminating and preserving the official information products of the government. 

Agencies large and small are using scanning and digital storage technologies to archive and preserve historical artifacts of all kinds. The Library of Congress, which houses more than 168 million items, is an obvious example. However, there are many other smaller agencies that also benefit from digitization, and many are able to use commercial technologies to do so. 

Thomas Rieger, manager of digitization services at the library, notes that smaller organizations like local historical societies or libraries can perform high-quality document digitization with some very basic equipment today. “It’s gotten good enough to do that,” he says. However, major institutions like a university or the Library of Congress “need very sophisticated, very high-quality equipment to the job right.”

Notably, Rieger says, the cost of a higher-end piece of scanning technology is amortized over time, and over a long period of time there actually is not that much cost difference between using “the very best equipment” and something that is off the shelf

Agencies Embrace Mass Digitization

The GPO is not the only smaller agency that engages in a lot of digitization. For example, the Smithsonian Institution is currently working to digitize its collection of 155 million items

In 2015, the Smithsonian’s digitization program office digitized the National Numismatic Collection, America's collection of monetary and transactional objects. The 250,000 pieces of paper became the Smithsonian’s first full-production “rapid capture” digitization project. The term “rapid capture” refers to the speed of the workflow, according to the Smithsonian. Before this process was in place, digitizing a single sheet could take as much as 15 minutes, at a cost of $10 per sheet. Now, the team works through 3,500 sheets a day, at less than $1 per sheet

The DPO’s Mass Digitization Program supports the Smithsonian’s efforts to digitize its collections as comprehensively, quickly, and cost-effectively as possible. The program helps “build workflows, which move objects from storage to digital capture stations efficiently; by creating sustained high speed, high quality digitization processes.” 

The technologies involved in digitization change with the types of materials that are being digitized, according to Rieger. “For example, we use very specialized equipment to digitize film, photographic film, and the way we do that we would never use for anything else,” he notes. 

Fundamentally, the scanners that agencies use may not be that different than the scanners used at many agencies, Rieger says. Agencies like the Library of Congress use scanners with tri-linear arrays, meaning there is a row of red, a row of green and a row of blue sensors and a mechanical device that moves them across the object. “And, from that, you get some reasonably good scans,” he says.

Today, that is the highest quality method of digitizing, but there are caveats to that, according to Rieger. “There's all sorts of mechanical issues that happen if that transport isn't smooth, and we can measure that, we do measure that,” he says. “Because that's a failure point with that kind of scanner.”

However, when it comes to scanning technology, Rieger says agencies are “on the edge of something much better.” 

“Let's call it a revolution in the way that digitization is done, and these are things that you'll see in five to 10 years,” he says.

MORE FROM FEDTECH: See how agencies are moving toward digital records. 

The Future of Scanning Technology

The human eye sees red, green and blue, but it doesn't see them as distinct separate objects. “The red and the green, actually, have a tremendous amount of overlap, and the eye and the brain, being the wonderful computer than it is, can figure all that out,” Rieger says. “Cameras, whether they're photographic film cameras or digital cameras, see those distinctions as very hard distinctions between red, green and blue, and the net of that is that your eye doesn't always see color the way that film or cameras do.” 

That causes “definite problems if you're trying to actually reproduce things for posterity, for an archival situation like this,” he says. 

The way around that is spectral imaging, Rieger says, which takes very narrow wave bands of light and captures 10 to 20 of them. “If you do that, you can then map the way the eye sees color,” he says. “And you've captured what was really there, not what you thought was there. That's coming. We're working on it.” 

With spectral imaging, technical instruments can “measure that color very, very accurately,” Rieger says. 

“The eye won't see it that way, but that's what it really is,” he adds. “We can then either map that to the way the eye sees color or to what it was, and we would have different reasons for doing both.”

Carol M. Highsmith/Wikimedia Commons
Mar 08 2019

Sponsors