Feb 16 2022

Algorithms May Take Over the Job of Scanning Dense Federal Documents

Artificial intelligence can take the drudgery of poring through minutely detailed reports away from human readers.

Every year, accountants at the Bureau of the Fiscal Service must read through at least 2,000 densely written pages of federal appropriations bills to determine how much money each government agency should receive. It’s a four-week marathon to create about 200 “warrants” that authorize agencies to spend their new appropriations.

The bureau is experimenting with ­artificial ­intelligence to speed the drudgery-filled process, hoping to use machine learning and natural language ­processing to train an algorithm to interpret legislation.

It would identify the three primary pieces of a warrant — the purpose of the funding, the dollar amount and the period during which the money can be spent.

“It’s still very much a work in ­process,” says Stephen Keller, a senior data scientist in BFS’ office of the chief data officer. 

Nevertheless, the test was 85 percent accurate in determining funding recipients. “Our focus is not 100 percent ­accuracy,” he says. “Hitting 85 percent helps people.”

Click the banner to get access to customized content on emerging tech by becoming an Insider,

Technology Can Help End the Grunt Work for Feds

The push to develop AI to help read complex and lengthy government and scientific documents is gaining ­momentum, especially at the Department of Energy’s national ­laboratories. One key obstacle, however, are PDFs, which hold information in a file more akin to an image.

“It’s an extremely difficult problem,” says Robert Patton, who leads the Learning Systems Group at Oak Ridge National Laboratory. “From a Word document, we can easily access information and do natural language processing.”

Patton is developing technology that uses both machine learning and image analysis to create more accurate PDF-to-text extraction tools. The system can also identify tables in PDFs and pull them out in a structured format for machine analysis, but that needs further development, he says.

2,126

The number of pages in the final federal budget for fiscal year 2021

Source: Consolidated Appropriations Act, 2021

Research scientist David Butler at Energy’s Lawrence Livermore National Laboratory was among the authors of a paper examining the use of AI to extract recipes for nanoparticle synthesis from previously published articles.

Scaling up nanoparticle production requires specific information on temperature, time and mixing. “It’s always harder than you think it’s going to be,” Butler says. “Chemistry is particularly hard to recognize in papers.”

EXPLORE: How might the Department of Veterans Affairs use AI to streamline tasks? 

Agencies Use Deep Learning to Scan Through Documents

Researchers at Argonne National Laboratory needed to find the best dye for making a certain solar cell, requiring a search through 40 years of published papers and journals, says Alvaro Vazquez-Mayagoitia, a principal software development specialist.

He relied on deep learning tools and standard natural language processing to pull information from text, tables and captions. Machine learning — combined with the assistance of the lab’s supercomputer — also helped to find the correct information.

“Information published previously should be rescued,” Vazquez-Mayagoitia says. “These tools do the job.”

RELATED: How will agencies make use of artificial intelligence and predictive analytics in 2022?

hamikus/Getty Images

aaa 1

Register