Close

New AI Research From CDW

See how IT leaders are tackling AI opportunities and challenges.

Jun 03 2025
Artificial Intelligence

Intelligent Document Processing Drives Efficiency in Agencies

Officials can quickly extract and understand data with artificial intelligence.

Leaders at the National Archives and Records Administration are striving to make the nation’s historical documents more accessible. “At the archives, we have 13 billion paper documents, and we have been working very actively to digitize as much as possible,” says Jill Reilly, acting chief innovation officer at NARA.

Digitization here goes beyond optical character recognition, making text machine-readable.

NARA is starting to leverage artificial intelligence to power intelligent document processing, or IDP — a means of interpreting and classifying data from a range of document types.

Experts say IDP has powerful potential to drive efficiencies in the federal space.

“We have many reasons to collect data. In many cases, it is mandated. But we need to make sense of that data. How do we interpret it? How do we extract things in a timely manner, both for internal efficiency and efficient customer service?” says Alan Shark, executive director of the Public Technology Institute (PTI).

IDP offers an answer, he says, by taking “existing technologies — optical character recognition, natural language processing — and now we add AI to create structured, actionable data.”

Click the banner below to learn what's next for artificial intelligence.

 

A More Searchable NARA Catalog

At NARA, Reilly has been using Amazon Textract, and she sees big possibilities.

“We have about 390 million digital objects in the Catalog, and it continues to grow. We’re working on a big goal of getting to 500 million digitized pages by October 2026,” and AI will help make sense of all those documents, she says.

AI is great at “identifying people’s names, place names, dates and occupations,” which in turn makes it easier for people to search the Catalog, she says. IDP has proved itself capable of “pulling out data and tagging it and identifying it, if it’s about a person, a topic, an event, a date. It’s reading those key concepts and pulling them out of the extracted text.”

NARA did this, for example, with the 1950 census. “We trained Textract and some other tools from Amazon to read the census forms that were taken by hand and know where to anticipate a name, an age, a date, a place, a street name,” she says.

“Then we were able to have those tags pulled out and labeled as AI-generated tags,” which made the data easily searchable, she adds.

Jill Reilly

 

AI-Driven IDP Streamlines Operations

Reilly describes the power of IDP as critical to mission success.

The records of the National Archives represent “people’s stories and our shared history. People need to be able to search and find things that could be related to their family and their ancestors,” she says.

In this context, IDP “could help them understand local history, find out what was going on in 1950, who lived there and where they live now. There is a lot of content here for historians, students and teachers,” she says.

Making documents not only digital but also searchable is the key to success here. Searchability “makes that online user experience friendlier and more efficient,” Reilly says.

“We are focused on streamlining that customer experience, being responsive to the customer feedback that we gather from our online surveys and focus groups,” she says. “We get feedback about how everyone would like to see more searchable online content from the Archives, and we’re really working to improve that access.”

37

The number of public websites managed by the National Archives and Records Administration

Source: archives.gov, “National Archives by the Numbers,” Feb. 12, 2025

Other agencies could see similar benefits, according to Amy Jones, U.S. public sector AI lead at EY. “IDP offers a low-effort, high-reward opportunity to enhance existing systems,” she says.

For example, many agencies use modern ticketing systems to improve visibility and automate workflows, but these rely on manual processes for routing tickets and providing updates.

“By incorporating AI-driven IDP, agencies can eliminate these bottlenecks, streamline operations and ensure more accurate, real-time information throughout the process,” Jones says.

Revolutionary Data Extraction and Transparency

In addition to applying Textract to census data, NARA also has used IDP to open access to tens of thousands of pages of Revolutionary War–era documents.

“We conducted a project with Family Search, one of our digitization partners,” to extract text from handwritten Revolutionary War–era pension files, Reilly says.

LEARN MORE: Machine learning models are expediting federal tech efforts.

The Archives shared about 30,000 pages that had been manually transcribed by NARA’s Citizen Archivist community, “and we were able to use that as a ground-truth data set to train the text extraction tools,” she says. The partner organization then used the tools to extract text from just over 2 million additional documents, “and we are sharing them with the public through the Catalog in a very transparent way.”

To achieve those kinds of results, she says, agencies need to have ground rules in place.

“First and foremost, it’s about using AI and machine learning tools in a trustworthy, risk-aware way. The National Archives is a leader in responsible and trustworthy use of AI in the Archives and library science field,” she says.

That means, in part, that “you maintain the layers, so you have the authentic copy of the document that you stand by as authentic, original,” she says. Then, it’s important to ensure “that the metadata is labeled as AI/ML-generated, so our citizens have that visibility into the sources of the information that we’re sharing online.”

Overall, PTI’s Shark sees IDP benefitting federal workers at a time when job satisfaction may be stretched thin.

“If you’re acting like a machine and all you do is process the same thing over and over again, you’re bored, you become less efficient, and you might be making mistakes,” he says, adding that AI-informed IDP can help improve both the process and the product.

DISCOVER: These are the four biggest risks to generative AI.

IDP and Government Efficiency Go Hand-in-Hand

Other agencies also look to leverage intelligent document processing.

The Department of Energy, for example, reports achieving more than 92% precision when using Google Document AI to extract information from test data.

There’s urgency these days around government efficiency, and Shark expects that to bring IDP to the fore.

“There’s an open desire to cut some of the fat that exists in government,” he says. “These tools become far more important at a time when there is such political pressure to do more with less.”

In document processing, “the whole issue is about accessibility and retrieval, getting the information into the hands of the right people, at the right time, very quickly — not having to wait weeks or days or months,” he says.

Photography by Valerie Chiang