Apr 01 2021

How Federal Agencies Can Make the Most of Unstructured Data

Dell PowerScale storage technology enables better analysis of vast troves of unstructured government data.

The Air Force captures all kinds of data while its planes fly. Sensors record weather information, the aircraft collects GPS mapping data and onboard systems track operational and airframe performance.

Usually that data would come into the Air Force in an “unstructured” form. The weather data would fall into one storage silo, the GPS data into another, and the performance statistics would stay in a separate storage unit.

If the service branch wanted to determine a relationship between, say, the weather and flight performance, it would have to process the information with algorithms that could communicate across silos to look for patterns and meaningful insights. The Air Force IT department would have to write specific code and bridging software to do that analysis and find a way to apply artificial intelligence across data sets.

Instead, with a different kind of storage system, all the unstructured files could flow into the same “data lake,” explains Dallas Nash, senior director of sales in Dell Technologies’ unstructured data solutions group. A data lake has none of the barriers that silos do. All the information is inherently consolidated for AI and machine learning to fish out and tie together.

“Just imagine the data and the business intelligence that can be gleaned by the Air Force coming out of that singular repository,” Nash says.

DISCOVER: Find out how to design technology that supports your agency’s data storage needs.

“We want to rip down those silos,” he adds. “We want to take all of that resident data and insert it into a data lake where you can run queries against it and you can find the missing gaps of information that you may be looking for.”

Nash’s group at Dell EMC developed the PowerScale storage system to fill those data lakes and help federal agencies make the most of their unstructured data. PowerScale uses the Isilon OneFS operating system to store and manage that information in a variety of node types that let users scale and prioritize.

The Challenges Associated with Unstructured data

Unstructured data is any electronic file that doesn’t land in a structured platform, such as a database, where it is easy to organize and access. Government data centers are full of unstructured files collected from outside sources such as social media, blog posts and websites, along with images and videos.

Some agencies might set up their systems to capture those files in a database, making them structured for sorting and searching. Otherwise, they would fall into unstructured categories, into silos that inhibit usability, Nash explains.

Deloitte Insights estimated in a 2017 report that as much as 80 percent of all data generated by an organization is unstructured. Other analysts have estimated that the portion of unstructured data is closer to 90 percent.

The volume of unstructured data has expanded so exponentially that agencies trying to manage it have struggled to keep up.

Market research firm IDC has forecast that, within three years, the volume of unstructured data across commercial and government sectors will reach 849 exabytes.

“This massive data growth is driven by growth within the datacenter as well as several new initiatives, such as IoT, artificial intelligence and analytics and spans across the edge-core-cloud,” IDC noted in July 2020.

Unstructured data storage has evolved from arrays to appliances to silos. As they generated more data, agencies stacked up appliances, then added more silos. They needed additional IT staff to manage that infrastructure, increasing costs and decreasing efficiency, Nash says.

“At the end of the day, for us, we are aggressively pushing the envelope, not just on how people consume IT but how they buy it and how they have to live with it.”

MORE FROM FEDTECH: Agencies can glimpse the future with predictive analytics technology.

Gaining Insights from Unstructured Data

With the data lake that PowerScale creates, a government health agency could store patient X-rays in a way that allows machine learning to associate common characteristics of bone degeneration or disease development, leading to faster diagnoses.

“The machine can understand much more than the human mind about the patterns that are starting to occur in the pathology that have been gleaned from the data.”

The Navy has been collecting data on the biology of the world’s oceans so it can produce a better coating for its ship hulls that is less likely to rust or degrade, Nash says. “They’re trying to get the most effective design and creation models, and they need to understand the type of marine life that these vessels are moving through with regularity,” he says.

PowerScale gives federal agencies the flexibility to prioritize data accessibility on different tiers. They can put the most urgent information in a high-performance flash storage tier.

They can drop more routine data into a node where it’s available for reference or comparison but doesn’t require the speed of flash, which is costly. As data gets older and less relevant, an agency might want to hang onto it but not need it as readily available, so it could automatically drop down the stack to a lower “deep and cheap” tier, Nash says.

Dell is increasingly encouraging the use of PowerScale at the edge of data collection, literally out in the field. One of its customers is a Canadian organic produce distributor that works with small farmers all over the world and studies climate patterns to determine better yields.

Sensors on the farm pick up humidity levels or the pH balance of the soil. That data not only allows the farmer to run an analysis but also pushes the data to cloud for the distributor to better plan its supplies and routes.

A federal agency like the Agriculture Department could use this edge-to-core-to-cloud flow of unstructured data to study the nation’s farming communities and help them maximize their operations, Nash says.

“It’s really about how we can better move the human world forward and driving human ingenuity,” he says. “The solutions come together from the software, from the people, from the processes, and we’re not looking to replace those things. We’re looking to augment them so that our customers, our partners and our technology partners do this better faster and better together.”

Brought to you by:

kanawatvector/Getty Images