Close

See How Your Peers Are Moving Forward in the Cloud

New research from CDW can help you build on your success and take the next step.

Jan 24 2025
Cloud

Legislative Agencies Adopt Paperless Strategies to Publish and Archive Records

Cloud plays a vital role in federal digital publishing and archiving.

The Government Publishing Office produces and distributes official government publications to Congress, agencies, federal depository libraries and the public.

Specifically, GPO publishes the Congressional Record, a daily written account of the previous day’s congressional proceedings; the Federal Register, the official journal of the government; and GovInfo, a platform for accessing and preserving official documents from the three branches of government.

“With the Congressional Record, we are essentially producing a daily newspaper,” says GPO Director Hugh Halpern. “Our staff compiles the printouts and electronic documents that Congress sends over. Our role is proofreading rather than editing, making sure our final outputs reflect Congress’s intentions.”

The electronic files sent by Congress are formatted in U.S. Legislative Markup (USLM), a standardized XML data format that supports easier downloads and repurposing. Currently, these files must go through several formatting changes, including GPO’s typesetting code, to be ready for the physical publishing of the Congressional Register and its PDF version.

Click the banner below to learn about advancing cloud deployments.

 

All of this stems from the legislative branch’s requirements for digital, not physical, documents. GPO and the Library of Congress are engaged in IT modernization, centered on cloud technology, to more effectively manage the enormous amounts of data and information generated by the federal government.

“This is a really big shift with a lot of implications on records management across federal agencies,” says Professor Richard Marciano of the University of Maryland, founder of the Advanced Information Collaboratory.

The shift is necessary so that federal agencies don’t face backlogs in the face of ever-increasing data volumes, he adds.

Judith Conklin
Even though we are putting things in the cloud, we always have a preservation copy on-premises in case something happens with our contract and we have to move to another cloud provider.”

Judith Conklin CIO, Library of Congress

Simplify Electronic Files to Modernize

Under Halpern’s leadership, GPO is simplifying and modernizing how it prepares its electronic files. The agency is moving away from MicroComp, its decades-old composition software, and transitioning to XPub for its work on the Congressional Record. XPub, GPO’s new composition engine based natively in XML, will allow the agency to process data in a variety of modern formats.

“Once fully implemented, XPub will cut out a lot of the steps in our process, allowing us to go from an XML input to a finished PDF more quickly and efficiently,” Halpern says. “It will also be able to create responsive HTML, making it easy to read on a computer.”

The original paper copy serves as the record of Congress, which is why GPO goes to great lengths to make sure all versions, whether printed or digital, reflect the original. As finalized PDFs are distributed to GovInfo, the National Archives, the Library of Congress and to Congress itself, maintaining security is key.

LEARN MORE: Follow these four principles to adopt AI ethically and securely.

“We assure integrity with a great deal of metadata confirming that custody has been securely maintained throughout the process,” Halpern says. “GovInfo has ISO 16363:2012 certification as an authentic, trustworthy digital repository, as well as CoreTrustSeal certification. Maintaining this trust is important to our stakeholders.”

GPO has a cloud stack supporting its daily work on the Congressional Record and the Federal Register, as well as an on-premises data center for some of its workloads. Cloud technology plays an important role in supporting GPO’s GovInfo platform, which serves as both a searchable database for accessing federal publications and a repository for preserving those documents.

Hugh Halpern

“From an archival standpoint, we keep a copy of the GovInfo data in Azure cloud,” Halpern says. “This gives us a backup for all of that valuable information.”

GPO produces its publications with HP computers and maintains archives with NetApp cloud storage.

Improving Collection Access with an Open-Source Platform

The Library of Congress — in addition to managing the Congressional Research Service, Congress.gov and the U.S. Copyright Office — is the world’s largest library.

“Our digital collection is enormous,” says LOC Director of Digital Services David Brunton. “The preservation copy of our collection totals 28.7 petabytes, including more than 1 billion objects. Our lower-resolution online presentation copies account for about over 6 petabytes of storage.”

Using its Digital Collections Strategy as a guide, LOC continues to acquire, digitize and make available its collections to researchers and the public. In support of this strategy, the library is switching to a new IT solution, the Library Collections Access Platform.

DISCOVER: The Smithsonian’s digitization program preserves delicate treasures.

“LCAP is an open-source project using EBSCO’s FOLIO library services solution,” Brunton says. “It enables better search and discovery of collections, improving overall access for our users. We already use the system for acquisitions work, and we expect to roll it out to end users for search and circulation in 2025.”

The FOLIO open-source solution is hosted in the AWS cloud. LCAP will provide a simple, accessible window into a very complex process going on underneath, namely the metadata tagging and organizing of millions of unique items in the collection.

“You’d think building a simple ingest pipe to storage would be easy. It is not,” says CIO Judith Conklin. “There is a lot of metadata to be managed, different formats and types of collections. As vast as our collection is, without that metadata, you wouldn’t be able to find anything at all.”

10.8B

The number of retrievals of government information from GPO’s GovInfo platform since its inception in June 1994

Source: GPO, “U.S. Government Publishing Office: Annual Report 2023,” April 2024

Storage for a digital collection of this size and scope is a big consideration. In total, LOC uses 175 petabytes of storage, Conklin says, some of which can be attributed to the library’s 3-2-1 data storage methodology: three copies, two storage technologies, one copy geographically dispersed to two different locations.

UP NEXT: Document AI saves agencies time and money.

“The collections are very historic and important, so we need to be careful with it all,” Conklin says. “Even though we are putting things in the cloud, we always have a preservation copy on-premises in case something happens with our contract and we have to move to another cloud provider.”

Accessing adequate storage is an ongoing challenge for LOC. Unlike a typical library, which can cull its collections to meet space requirements and remove them from public access, the LOC collections continue to grow.

“We have a practical mission of providing access, but also a historic mission to preserve our nation’s treasures,” Conklin says. “We do not remove or purge our collections, which presents a challenge because our storage needs will always be increasing.”

Photo courtesy of GPO