Data Analytics

How Agencies Can Optimize for High Data Ingestion at the Edge

The ability to take in and analyze large amounts of data in real time is critical to many agencies’ missions.

Twitter

Darren Pulsipher is chief solutions architect for the public sector at Intel. He also hosts a weekly podcast called Embracing Digital Transformation.

Government agencies are collecting massive amounts of data at the edge, but are their systems optimized to make the most out of this information? Some signs point to “no.”

A recent report from data analytics firm Splunk indicates the public sector is technologically unprepared to harness the power of the data that will be created over the next few years. That is problematic for organizations that rely on edge computing for actionable intelligence.

Indeed, most government agencies are employing a wide array of devices — data ingestion points — to collect and analyze data at the edge. In fact, there are so many devices collecting so much data that the ability to handle high rates of data ingestion is critical if that information is to be turned into actionable intelligence for users.

Consider a couple of potential scenarios. An agency might be actively monitoring and running analytics and threat detection on its security and systems logs. That could be an enormous amount of data that must be ingested at a high rate so that it remains complete and provides a detailed picture of the organization’s security posture.

Alternatively, multiple sensors on a Navy ship might be scanning surrounding waters for enemy vessels or recording 4K video streams that must be processed right at the edge for real-time insights.

In short, the ability to ingest and analyze high data volumes is important if agencies are to make the most out of the possibilities offered by edge analytics. To accomplish this goal, agencies must take three very important steps: Scrub the data that is being ingested to ensure its quality, leverage the right storage systems and minimize high data transfers.

Keep the Garbage Data out of Agency Systems

The term “garbage in, garbage out” has become a cliché because it’s true. Technologies being deployed at the edge, like artificial intelligence and machine learning, are only as good as the data they are collecting.

The challenge is that data being collected at the edge is raw and must be cleaned and normalized before it can be effectively analyzed. This is tedious and time-consuming, but very necessary.

The federal government is relying heavily on AI and machine learning to improve outcomes. Its trust in those technologies will only be rewarded if the data is of high integrity. Data cleansing ensures more accurate and more trustworthy recommendations.

DIVE DEEPER: Find out how agencies can make the most of unstructured data.

Change the Way Data Is Stored

Once the cleansing process is complete, the data must be stored in a highly scalable database. Traditional storage appliances are not appropriate, as they are not built to manage very high data volumes.

Increasing the speed of temporary storage is essential for faster processing and analysis of high amounts of data. Most systems today ingest data to temporary storage before passing it along to long-term storage for processing. This process can be sped up by employing Non-Volatile Memory Express drives that offer consistent performance and high-read, high-write content.

Getting around having to use temporary storage in the first place is an even better option. Persistent memory offers enormous scalability, making it ideal for high data volumes. Data can be ingested directly into the persistent storage module without having to go into temporary storage. This can save time, allowing the data to be stored and analyzed more quickly.

Minimize High-Volume Data Transfers

It’s even better to decrease the amount of data sent back to the data center. This is where the true promise of edge analytics begins: By pushing analytics to the edge, only the most critical data needs to be sent to the central core for more in-depth processing.

Technologies being deployed at the edge, like artificial intelligence and machine learning, are only as good as the data they are collecting.”

Darren Pulsipher Chief Solutions Architect for the Public Sector, Intel

If, for example, there is enough compute power at the edge to process a video stream and identify relevant objects in a scene and time series, the system may only need to send certain parts of the data back to the data center for further processing. Connected devices within the network can share data with each other to develop an accurate assessment of a situation without having to rely on a core data center.

For instance, if the Transportation Security Administration is tracking a suspicious individual at an airport, they can use multiple cameras without having to centralize all video data in one place.

Those connected cameras can share information about the individual and deliver relevant intelligence back to airport security. Deeper analysis on the object can be done at the core if necessary, but the large video files do not need to be sent back to the core for initial analysis.

This saves the TSA a great deal of time and bandwidth, while still providing the information the agency needs to make instant decisions on what to do with the suspect.

As the amount of data government agencies collect at the edge continues to grow, the ability to ingest high data rates will become increasingly important.

Taking these steps today will help agencies make the most out of this information, maximize their investment in edge networks and, most important, provide users with important insights in times of need.

gremlin/Getty Images