The federal government is awash in data, but can it make sense of all of that information and make it more useful and secure? That’s the goal of Donna Roy, a top data official at the Department of Homeland Security, who wants to turn Big Data into “smart data.”
Roy, executive director of the information sharing and services office at DHS, thinks that the Internet of Things and associated sensors will significantly increase the amount of data federal agencies ingest and analyze. Speaking in a keynote session on Wednesday at the MeriTalk Big Data Brainstorm conference in Washington, D.C., Roy said that she wants government to be able to “ingest and link data” much as she does on a website like Ancestry.com.
“They make it very easy for me to find data in a vast amount of collections of data, and they make it in a way that is consistently linked and consistently processed,” she said. “I want more of that.”
The government needs to adopt commercial data practices — like software that recognizes who a person is in a photo as that photo is being taken — but it needs to do so in a safe and secure manner, Roy said.
Agencies need to create networks and algorithms that can operate at the speed of IoT, she said, and deal with data processing on the edge of networks, as happens with connected industrial control systems in power plants, for example. “I want these things to be ubiquitous in government, and I want us to be able to do them,” she said. “But we’re not there yet.”
Going from Big Data to Smart Data
The government faces many challenges with Big Data, according to Roy. One is just an information glut, which leaves agencies taking on too much data. Another is what she called the “dark side” of Big Data and analytics, in which connected devices can be hacked. Another threat is information spoofing, in which a malicious party impersonates another device or user on a network to launch attacks against network hosts, steal data, spread malware or bypass access controls.
Even after avoiding those pitfalls, Roy said, agencies need to “get data ready for analysis at speed,” meaning that data should be in a form that is ready to be analyzed as soon as it hits analytics platforms. Currently, she said, her teams spend about 80 percent of their time just searching, ingesting and getting data ready for analysis. “Can we get to smarter data and smarter data crunching?” she asked.
Which leads to the question, what is “smart data?” Roy said that it’s data that is independent of software, applications, devices or networks but still is actionable. It’s also data that is self-describing and self-protecting. It has its own context and semantics. She pointed to all of the digital data that accompanies a music file downloaded from iTunes, for example, as a kind of smart data — independent, portable, with its own description and protections built in.
Agencies could use smart data to help evaluate people who are seeking refugee status in the U.S., Roy said. Smart data could also let patients’ medical records move from doctor to doctor without a patient having to do anything. Law enforcement case information could become more mobile, shareable, self-describing and self-protected, she suggested.
In the world of IoT, vendors are starting to put data processing and identity capabilities onto silicon chipsets so that sensor data can be processed at the network edge. Agencies need to take advantage of those kinds of capabilities, Roy said.
“I think we need a bit of a data revolution,” Roy said, as well as support from the Office of Management and Budget to adopt governmentwide approaches to this kind of data portability.
Moving to a New Data Paradigm
Roy suggested that the government needs to reform how it treats data, and cited the European Union’s “once only” data principle as a model worth emulating.
As the EU explains: “Citizens and businesses should have the right to supply information only once to a public administration. Public administration offices should be able to take action to internally share this data, respecting the data protection rules, so that no additional burden falls on citizens and businesses.”
In this scenario, once a citizen provides his or her data, to, say, apply for a Social Security card, he or she would not need to provide that same data again when applying for a fishing license or medical benefits.
Roy acknowledged that the United States is a large and complex country and that such an undertaking would be difficult, but she still urged the government to move toward that mode. “The driver for the “once only” principle is getting them to accept common data standards for distinct data,” she said.
Agencies need to think about what data they should be computing, not just how they should compute it, Roy added. They should also focus on getting the right answers out of Big Data platforms rather than just proliferating them and collecting more data. “I’m not sure data glut gets us anything,” she said.