EPA, Veterans Affairs Seek Insights from Big Data and Analytics Amid Data Deluge
With better data processing, the Securities and Exchange Commission could improve fraud detection drastically. Seems obvious, right?
But SEC CIO Pamela Dyson says that while this statement is obvious, it also reflects a quandary that’s not easy or fast to address — at SEC or at other agencies across government. As Dyson notes, all the data in the world means little without efficient and accessible storage options.
“If you can analyze more data faster, you can do deeper examinations or you can examine more registrants,” she says. “But there’s a lot of data to collect, and right now, the ability and capability of our systems restrict us.”
A Data Avalanche Is Coming
The Environmental Protection Agency faces a similar dilemma. In the past, certain facilities recorded information using only one or two data points annually. Today, the growth of connected sensors sees those numbers rising — and quickly. EPA predicts some facilities will soon report more than 100 million data points each year.
“That might even be a small estimate,” EPA Chief Technology Officer Greg Godbout says. “It’s a different world.”
EPA has monitored air quality for a long time, first on paper and then in electronic form, Chief Data Scientist Robin Thottungal says. “With Internet-connected sensors, we will enter an ecosystem where we collect information in millisecond intervals.”
That scenario, which finds EPA and other agencies gathering and processing hundreds of millions of times as much data as they had just a year ago, illustrates the concept of so-called “data-geddon,” a term used on HBO’s hit show “Silicon Valley” to describe the exponential and limitless information growth.
The prospect of collecting, storing and processing more data every year may seem daunting, but the situation also presents opportunities for government to leverage that data to gain new insights, improve processes and deliver higher quality services.
Data Storage Challenges Abound
“Large amounts of data present both a resource and a burden,” says Shawn P. McCarthy, research director for IDC Government Insights. “While storage becomes less expensive, other investments must be made in processing power and software solutions.
These solutions must handle large data sets for things like business analytics and pattern matching.
Plus, from a security standpoint, large data stores are particularly enticing to hackers, McCarthy adds.
At the SEC, Dyson says storage environments must be updated to make Big Data analytics more efficient.
Her team plans to pilot a hybrid storage solution next year that aims to show where costs might be reduced while providing a higher level of functionality.
“We’re on physical storage now, and it’s very expensive,” she says. “We have all of this data in high-availability storage. Certainly, we can look at a more hierarchical storage solution.”
Dyson says the agency wants to create a cloud analytics program that delivers downloadable results. Her office has worked with a number of cloud vendors to develop a prototype to see what that could look like.
Running analytics in the cloud allows a higher level of performance than working with data locally. Dyson says the SEC needs a solution that is scalable so the agency can increase capacity when it’s needed.
“The one thing we don’t want to do is replicate these huge data sets locally if we can work with them in the cloud,” Dyson says. “The cloud’s elasticity is not available with other systems. We currently build for the peak, and that’s a very costly proposition.”
At EPA, Godbout says storage costs will not hinder the agency as data grows.
“Storage is in a race to the bottom,” he says, adding that third parties own much of data EPA works with, and the storage burden typically falls to them. “The challenge is how we consume data and how it will be used, along with the data’s accuracy and quality.”
“This is a problem for everyone in the 21st century,” Godbout continues. “Unless we improve our processes for collection, the data will overwhelm us to the point where we can’t work with it.”
Filtering Data for Insights
A variety of technologies — including storage, security and analytics — must work in harmony if agencies are to manage the data boom effectively. As Dyson mentioned, the cloud, in particular hybrid cloud environments, gives agencies a way to work with data without overloading systems.
But to address these interdependencies, crafting new policies also is necessary. The Transportation Department, which anticipates a major uptick in data as autonomous cars take to the highways, is in the midst of an overhaul of its data structure.
The plan includes better segmenting of data to funnel out unnecessary information while ensuring that more valuable data is stored in the cloud for ready use by analytics programs.
To test its policy, DOT also has a pilot program that logs driving patterns using onboard sensors and back-end analytics. So far, the program has collected 4 petabytes of data from test drivers, who collectively drove for 1 million hours.
That information, combined with platforms such as IBM’s Intelligent Transportation system, should greatly improve the way roads and other transportation infrastructure is built and maintained. The new data governance approach in tandem with the pilot data analytics effort will help DOT tweak its strategy to see how it can scale to accommodate increasingly large data sets.
Taking Advantage of Analytics
There’s also a need to layer precision into the analytics, says Maureen Ellenberger, executive director for veteran relationship management at the Veterans Affairs Department. Agencies must develop new tools and techniques to answer particular questions with data, she says.
Big Data presents an unprecedented opportunity for Veterans Affairs, she says. “We can use Big Data — from health, to benefits, to cemeteries — and bring analytics together to provide a lifetime, longitudinal view of those we care for.”
Expansive data analytics also could help the agency better identify public health issues such as flu outbreaks, as well as detect fraud and waste, reduce wait times and speed claim processing, but only if VA crafts its data analytics programs in the appropriate ways.
EPA’s Godbout agrees and sees data analytics as an untapped goldmine, with the potential to create value for agencies in unforseen ways.
“Data is not just letters and numbers,” he says. “It’s also video. It’s unstructured. It’s structured. It’s all of these different things.”
“Sometimes,” Godbout says, “you just don’t know how valuable the data will be until you put it in play.”