In his first full week on the job as the assistant director for cybersecurity at the Cybersecurity and Infrastructure Security Agency, Bryan Ware came to a realization: His Department of Homeland Security agency collects enormous amounts of data.
“Fifty-five billion NetFlow records on any given day, approaching 2 terabytes of compressed binaries a day,” he said in a speech in January, adding with a smile, “That’s, like, a lot.”
The government is awash in data, housed in on-premises data centers and in private, public, hybrid and multicloud environments. Agencies understand the value of all that information, but can’t always access it in a timely manner, or share it with enough context to make it useful.
“Even with that volume, or because of that volume, it’s hard to see things that are important,” Ware noted.
A new federal data strategy is designed to help agencies overcome that hurdle. It calls for agencies to determine what upgrades their data infrastructures need, identify the kinds of data necessary to answer important questions and find and train qualified staff.
Find the Right Technology to Enhance Data Sharing
Agencies will also be working together to develop data protection toolkits, standardized research grant applications, automated inventory tools and more.
These are concrete actions toward a goal the government has been talking about for the past few years: using its vast resources of data to make better decisions and work more efficiently. Gartner named “analytics everywhere” as one of its top government technology trends for 2020, in which agencies move to processes that can provide information in near real time.
The trick is to find technology that makes this possible. Ware, in his speech, pointed to outdated and slow means of acquisition and procurement that automatically put an agency behind technologically.
This lag in acquisition may be one reason for the existence of shadow IT; an employee can install a nonapproved app that gets the job done faster than it would take to buy the software.
The technology that provides an agency with the ability to use all of its data must work throughout the data’s lifecycle — ensuring its safety and usability when it is collected; enriching it for proper use; and when the data is reported to other agencies, helping them predict outcomes.
Storing all that data in the cloud — whatever environment an agency chooses — is an initial step. But there’s so much data in government that centralizing or standardizing it isn’t an option anymore.
MORE FROM FEDTECH: What is the right level of cloud for data sharing needs?
Reference Architecture Creates a Foundation for Data
Some agencies turn to reference architecture — providing a foundation on which agency components can build their own IT structures and still maintain the ability to share data without compromising their needs. This is how the 17 intelligence community agencies handle their data.
Others choose to manage the data where it lives, using virtualization to run more than one operating system at a time and expand a system’s capability; or through containers, a more streamlined technology that can help software run cleanly among different environments.
Still others are turning to enterprise data management tools that let them design ways to handle data transparently and quickly, allowing an agency to see data from a new viewpoint.
In the government arena, this is critically important: Doctors can better see patterns of disease outbreaks, cyber professionals can better spot looming attacks, financial experts can catch suspicious spending patterns.
With the volume of data pouring into the federal government every day — and the new emphasis on data strategy from the White House — agencies must craft a method to organize, quantify, verify, explain and share their data in the most convenient way possible. The technology is available; they just have to choose the way that’s best for them.