Among the documents in the Clinton Presidential Library: this photographic contact sheet, a paper record of pictures taken during Bill Clinton's term in office.

Apr 20 2021

How Government Agencies Archive the White House’s Electronic Records

The process of preserving a presidential administration’s information has grown increasingly complex.

When a U.S. president left office 40 years ago, archival teams swept in to gather almost every piece of paper, picture, audio or video tape that the president and his staff had generated. 

They carried boxes of these official documents from the White House to the National Archives and Records Administration, which stores and makes them accessible to the public in perpetuity.

By the late 2000s, as more of those records came in electronic form, workers swapped out boxes for giant servers, loaded them onto trucks and drove them to their new quarters when the presidency turned over. The records were transferred onto servers in a NARA data center.

Following the most recent presidential transition, hundreds of trillions of digital files are now traveling from the White House’s cloud to NARA’s cloud-based Electronic Records Archives. By law, NARA takes legal custody of a president’s official records the moment he leaves office.

“We’re keeping this for the life of the Republic,” says John Laster, director of NARA’s White House Liaison Division. “These records have to be in an environment that allows us to ensure they are stable and that we’re not losing access.”
 

Managing a Growing Volume of Presidential Records

This monumental task has grown all the more challenging with the sheer volume of documents. When Bill Clinton — the first president to use email — left office in 2001, archivists tallied about 20 million emails and 3 terabytes of data. Eight years later, when George W. Bush reached the end of his second term, the number of presidential emails had jumped to 220 million, and data skyrocketed to 87TB.

Barack Obama left more than 300 million emails and 250TB of data in his eight years. After just four years in office, however, Donald Trump doubled the amount of data to 500TB, NARA estimates.

“The volume just of all electronic records is staggering, and it’s transforming how we deal with that archive,” says Gary Stern, NARA’s general counsel.

“The majority of the time that’s spent isn’t necessarily related to the actual physical move from one media to the other or into the cloud,” he says. “It is all of the steps to get it in an appropriate format, then to us, then verified.”

DIVE DEEPER: Discover how agencies are embracing document digitization with gusto.

A Brief History of Presidential Record-Keeping 

Congress passed the Presidential Records Act in 1978, largely in response to concerns that President Richard Nixon would destroy documentation related to the Watergate scandal. 

“It established that the records of the president are actually government property, because before then, they were not,” Stern says. “They were the personal property of the president, who could do whatever he wanted with them.”

The law defines which records require preservation and which belong personally to the president or vice president. Those official records must be maintained through the end of the president’s term, when NARA takes control.

The related Federal Records Act requires that agencies keep and store documents so they can be easily retrieved. The 1950 law did not foresee electronic records. “The web evolved faster than guidance evolved for what to do with this information,” says Dory Bower, archive specialist in library services and content management for the U.S. Government Publishing Office.

woman with curly long hair and glasses
The web evolved faster than guidance evolved for what to do with this information.

Dory Bower Archive Specialist in Library Services and Content Management, U.S. Government Publishing Office.

In 2008, the GPO and the Library of Congress teamed up with other agencies and academic partners to create the End of Term Web Archive, which captures government web pages and other online content that might be removed or discarded when an administration changes. This collection happens every four years, even if a president earns a second term.

New administrations generally change the look and content of agency websites, and “that means information that was readily available to the public before suddenly becomes harder to find,” says Malea Walker, reference librarian for the LOC’s serial and government publications division. 

“By documenting websites before and after these transitions, the library ensures not just the preservation of that information, but also the accessibility of that information to the American people.”

The LOC’s archives focus on cabinet-level agency websites and other specific subjects or event-related information. LOC also uses a web crawler and an in-house curatorial tool to collect material. An outside contractor then does the major harvesting, Walker says.

“There is so much in the crawls that we can’t possibly look at it all to make sure we captured it well,” says Abigail Grotke, assistant head of LOC’s digital content management section. “We are constantly looking at automation of workflows and processes as much as possible.”

READ MORE: See how agencies are moving toward digital records.

500TB

The amount of electronic data created by President Donald Trump from 2017-2020

Source: National Archives and Records Administration

The Technical Details Behind Archiving the White House’s Records 

For presidential records, NARA’s process starts at the White House, exporting files out of its system and reformatting them in the neutral archival format that NARA requires. 

“And then, once we get it, we have to do a whole lot of steps to get it fully ingested into our Electronic Records Archives,” Stern says. “So even when we get it and load it, then we have to do all sorts of validation, verification, standard IT checks to make sure we got it all, we can read it all, and then index and format it. It’s pretty complex.”

Both the White House and NARA generally contract with developers to build the necessary code to convert most records from proprietary native format to increase accessibility, Laster says. For example, a database might be exported into a series of .csv or .txt files, which are more appropriate for preservation. 

Trump’s single term created another challenge. With a two-term president, the archive teams can start preparing several months earlier. NARA hadn’t handled a single-term transition since George H.W. Bush left in 1993 — when most records were still paper and volume much smaller. 
The Obama archive took about six months, starting in May 2016, Stern says. Facing the greater volume of the Trump archive, the team began the job in earnest early this year. Not surprisingly, they’re still at it.

MORE FROM FEDTECH: What is an electronic document management system and how can it help your agency? 

 

Clinton Digital Library