You are here

4 Infrastructure Requirements for Any Big Data Initiative

To take advantage of Big Data, agencies must ensure their technology stacks — including storage, servers, networking capacity and analysis software — are up to the task.

Federal agencies, like organizations in virtually every sector, are handling more data than ever before.

According to Cisco Systems, global IP traffic is expected to more than double in the span of only a few years — growing to a monthly per-capita total of 25 gigabytes by 2020 (up from 10GB per capita in 2015).

This data boom presents a massive opportunity to find new efficiencies, detect previously unseen patterns and increase levels of service to citizens, but Big Data analytics can’t exist in a vacuum. Because of the enormous quantities of data involved in these solutions, they must incorporate a robust infrastructure for storage, processing and networking, in addition to analytics software.

While some organizations already have the capacity in place to absorb Big Data solutions, others will need to expand resources to accommodate these new tools, or else add new capacity to allow for a continued surplus of resources. This truly is a situation in which the chain is only as strong as its weakest link; if storage and networking are in place, but the processing power isn’t there — or vice versa — a Big Data solution simply won’t be able to function properly.

1. Storage

Often, organizations already possess enough storage in-house to support a Big Data initiative. (After all, the data that will be processed and analyzed via a Big Data solution is already living somewhere.) However, agencies may decide to invest in storage solutions that are optimized for Big Data. While not necessary for all Big Data deployments, flash storage is especially attractive due to its performance advantages and high availability.

Large users of Big Data — companies such as Google and Facebook — utilize hyperscale computing environments, which are made up of commodity servers with direct-attached storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered network-attached storage (NAS).

Cloud storage is an option for disaster recovery and backups of on-premises Big Data solutions. While the cloud is also available as a primary source of storage, many organizations — especially large ones — find that the expense of constantly transporting data to the cloud makes this option less cost-effective than on-premises storage.

2. Processing

Servers intended for Big Data analytics must have enough processing power to support this application. Some analytics vendors, such as Splunk, offer cloud processing options, which can be especially attractive to agencies that experience seasonal peaks. If an agency has quarterly filing deadlines, for example, that organization might securely spin up on-demand processing power in the cloud to process the wave of data that comes in around those dates, while relying on on-premises processing resources to handle the steadier, day-to-day demands.

3. Analytics Software

Agencies must select Big Data analytics products based not only on what functions the software can complete, but also on factors such as data security and ease of use. One popular function of Big Data analytics software is predictive analytics — the analysis of current data to make predictions about the future. Predictive analytics are already used across a number of fields, including actuarial science, marketing and financial services. Government applications include fraud detection, capacity planning and child protection, with some child welfare agencies using the technology to flag high-risk cases.

Many agencies have already begun to test Big Data applications or put them into production. In 2012, the Obama administration announced the Big Data Research and Development Initiative, which aims to advance state-of-the-art core Big Data projects, accelerate discovery in science and engineering, strengthen national security, transform teaching and learning, and expand the workforce needed to develop and utilize Big Data technologies. The initiative involved a number of agencies, including the White House Office of Science and Technology Policy, the National Science Foundation, the National Institutes of Health, the Defense Department, the Defense Advanced Research Projects Agency, the Energy Department, the Health and Human Services Department and the U.S. Geological Survey.

4. Networking

The massive quantities of information that must be shuttled back and forth in a Big Data initiative require robust networking hardware. Many organizations are already operating with networking hardware that facilitates 10-gigabit connections, and may have to make only minor modifications — such as the installation of new ports — to accommodate a Big Data initiative. Securing network transports is an essential step in any upgrade, especially for traffic that crosses network boundaries.

Download the white paper, "Making Sense of Big Data," to learn more about data analytics and read about real-world applications.

PashaIgnatov/Thinkstock
Dec 22 2016

Comments