There are hundreds of federal agencies, according to the Federal Register, and many agencies have multiple bureaus and departments. Each agency is collecting data in some form — but can any of them make sense of it all?
Last week, a group of federal-data experts convened to try to answer that question. The key problems are that data is siloed among different departments, is collected for different purposes and is often not shared. Additionally, agencies must try to reconcile the structured and unstructured data they collect.
In a panel discussion called 360 Degrees of Government Data, officials from the U.S. Army; Customs and Border Protection; the Department of Health and Human Services; the General Services Administration; and MarkLogic talked about how best to make sense of and use all of the data flowing inside the federal government.
Siloes Make It Tough to Share Data
Col. Linda Jantzen, acting director of the Army Architecture Integration Center, said the Army deals with many different types of data. Some of it is “tactical” and related to what is happening in the field and war theaters. Other types of data are more “functional” and are often generated and stored at Army bases and other fixed installations, and could be logistics, intelligence, engineering and medical data.
“All of them are data creators and data owners and heretofore have built up their data stores and how they use their data and are not used to opening it up and sharing it with others,” Jantzen said.
The Army also has siloes of data that are classified and unclassified. “And what this all adds up to is a great difficulty in sharing information across those environments, across those domains and across those functions,” said Jantzen, who is also the Army’s chief data officer. “And we're getting to the point where the data is pretty much overwhelming, and we are outpacing our ability to really find it, use it, store it and understand it.”
George Chambers, executive director of enterprise application development within the office of the CIO at the Department of Health and Human Services, said that the HHS has several large departments that could be independent agencies themselves. Those include the Food and Drug Administration; the Centers of Medicare and Medicaid Services; the National Institutes of Health; and the Centers for Disease Control and Prevention.
“We have silos of mission-specific data and mission-specific applications supporting those,” Chambers said, adding that the data can be related to research or data that is reports on what is happening in an area or with a specific disease.
Overall, the agency is focused on providing “data to the public or businesses that may affect the health or well-being of the overall U.S. population.” As a result, Chambers said, the HHS in 2011 decided to create healthdata.gov so that it could provide more “customer driven” data. The portal makes data and use cases available on demand to those who want to consume and manipulate the information, so that the HHS is not responsible for determining which data sets people would find useful.
Dealing with Structured and Unstructured Data
The panelists also discussed the difference between structured data — information that can be easily sorted, searched and processed through data mining — and unstructured data, which includes data from a wide variety of sources that has no meaning until it is organized. Kevin Shelly, the group vice president of global public sector sales at MarkLogic, said relational databases are not well suited for handling a world awash in unstructured data.
Most data today is unstructured, Shelly noted, and includes sources such as video, images, open-source intelligence and social media. The shift toward unstructured data will require cultural changes within agencies to embrace unstructured data and find a way to make sense of it.
Jantzen said dealing with structured and unstructured data “is not a new problem in the Army” and is something the entire Department of Defense knows it must address. She said the goal is to ensure that the DoD embraces recognized standards while also maintaining security of its data.
“The good news is we are making great headway on that with something we call the Common Operating Environment, which is already under way,” Jantezen said. That initiative is aimed at orienting the Army around a common set of IT standards architectures.
“We have to allow that freedom to operate across environments and across domains, but at the same time enable that data sharing,” she said.
Meanwhile, Wolf Tombe, CTO of U.S. Customs and Border Protection, noted that the agency has data streams, such as cargo and passenger data, coming into its relational databases but also has vast amounts of unstructured data from sensors along the border, including video and data from drones.
“The promise of Big Data is giving us the ability to digest that data, and make use of that [in a] fairly straightforward … way that saves money and allows us to get that and combine it with structured data,” he said.