It’s called Big Data for a reason.
The federal government collects data on nearly every aspect of life from energy and transportation to finance and security, and everything in between.
With more devices collecting data, agencies find themselves with larger amounts of data than ever before. To manage that growth, and avoid letting it overwhelm them, federal IT leaders leverage storage solutions that can securely host large volumes of data without overburdening the budget.
The solutions come at a valuable time, as the growth of data nears a dangerous tipping point.
“Agencies are forced to find solutions as information grows at exponential levels,” says Joe Garber, vice president of marketing for information management and governance at Hewlett Packard Enterprise. “If not controlled now, the data could grow so large that, no matter how much technology agencies throw at it, they may never get a comfortable handle on it.”
The Cost of Data
It costs between $4 and $100 to store a single gigabyte of data over the course of its lifetime, according to a 2014 study from Enterprise Strategy Group. Those numbers factor in security, accessibility and the sensitivity of the information, with $25 per GB seen as the standard for enterprise-class organizations.
And those figures apply to all types of data, whether recent and actionable, or outdated and irrelevant. Distinguishing between those types is an obvious way to keep storage costs in check, but that task is easier said than done.
“Many organizations realize that, on average, between 40 and 70 percent of its data has no value,” Garber says. “The problem is that it’s not clear where that 40 to 70 percent lies.”
The first step to effective storage is to determine what data holds value and what data does not. Information governance isn’t a new concept, of course, but its importance continues to grow as rapidly as data grows.
Agencies once perceived governance as such an overwhelming undertaking, any alternative looked easier. As a result, they delayed putting rules in place that categorize data. Garber says agencies can no longer wait to make those decisions. He pointed to technologies, such as ControlPoint and Structured Data Manager, that help large organizations look at legacy data and make decisions about value.
Those products, and others like them, enable agencies to start small with one repository or location.
“It has become a stair-stepped approach to governance,” Garber says.
Shifting to the Cloud
As agencies better categorize existing data, they must also determine how storage-worthy data will be used. Considerations include the frequency with which certain data is used, along with the required confidentiality of each data set. Such analysis helps IT better determine where to store each data set, such as within a private, public or hybrid cloud solution.
“Ultimately, federal customers will use a hybrid cloud environment,” says Rob Stein, vice president for NetApp’s U.S. public sector division. “I talk to a lot of federal CIOs, and data storage is usually one of the top five things they want from the cloud.”
Stein says most agencies choose private or dedicated cloud options for certain applications, but more and more look to the public cloud for solutions that can scale rapidly and efficiently, such as Microsoft Azure.
Audie Hittle, federal chief technology officer for EMC’s Emerging Technologies division, sees many of the same trends. He says that agencies initially turned to public clouds to realize cost savings, but have turned to a hybrid storage approach to provide more security protections.
With that setup, agencies store their most confidential data on-premises in a private cloud, hosting less sensitive data off-premises in a public cloud. Rarely used data can also go to the public cloud, as it presents the lowest-cost storage option.
“As much as they’d like to be further along, agencies are in an exploratory phase,” Hittle says. “They are determining what data can be moved to the public cloud to take advantage of those savings.”
Then there’s the challenge of moving the data sets to the cloud.
“Many federal agencies own petabytes of data, or more, and the sheer size overwhelms even the most high-bandwidth connections,” says Wayne Webster, Nimble Storage's vice president of federal sales. “That not only affects movement of data between on-premises and cloud, but also cloud to cloud.”
One solution is what Nimble calls cloud-connected storage, a hybrid approach that separates compute and storage resources. With the solution, data can be made completely private.
Webster says cloud-connected storage allows data to be stored on an independent storage array that securely connects to a public cloud. That allows the agency to leverage on-demand burstable compute resources. Agencies can also physically move the data when required, or leverage efficient array-based data replication that only sends changed and compressed data to reduce bandwidth needs.
The connection includes Federal Information Processing Standards (FIPS)-certified encryption. As a result, agencies can balance between multiple cloud providers. They can switch connections between clouds and leverage availability or spot-instance pricing. This approach, Webster says, neutralizes cloud vendor lock-in. No longer will agency data be held hostage with extremely high data egress cost as the data resides on agency storage devices.
Flash Storage Offers Flexibility
Flash has emerged as another option to reduce storage costs or increase performance when it’s time to access data. One strategy is to put recent data on flash for fast access. As that data ages and requires access less frequently, agencies can migrate to another storage technology.
Nimble’s storage on-demand service is one of the most popular options, offering a flash storage array for private and hybrid clouds, with per-gigabyte, per-month pricing.
“The on-demand service gives federal agencies the same flexibility and agility as cloud storage, while gaining performance of flash and the ability to physically secure data on an independent storage array,” Webster says.
EMC’s Rack Scale Flash features similar capabilities.
“Rack Scale Flash is a blend of direct-attached storage and shared/network storage,” Hittle says. “It enables the best of both worlds: super-fast, CPU-connected flash storage and shared storage for Big Data analytics. This is the year that flash and high-performance spinning disk intersect on the price-per-terabyte point curve.”
Hittle says agencies will increasingly turn to flash because of its reduced space and power requirements.
Some agencies have already made the switch. The Department of Energy’s National Energy Research Scientific Computing Center currently uses flash from NetApp. Stein believes other agencies will follow suit as flash will replace enterprise-class storage spinning disk over the next 12 to 18 months.
“It’s a technology that our customers want to use,” Stein says.
Others caution that flash isn’t a cure-all for information overload. Rodney Billingsley, Tintri’s senior federal director, says while flash has become mainstream, it’s not a panacea. “In our experience, flash is often seen as the ‘end’ rather than the ‘means.’ Our federal customers come to us because their conventional storage architecture doesn’t jive with their virtualized applications.”
“Throwing flash at the problem doesn’t solve the underlying architecture issue,” he continues, “but simply kicks the can down the road.”
Billingsley said virtual machine-aware storage might serve as a longer-term solution as it prompts user organizations to rethink storage architectures and strategies.
“In terms of virtualized workloads, federal customers need to scrutinize the underlying data architecture to solve the data at scale challenge,” Billingsley says. “The traditional industry architectures built on storing data within logical unit numbers or volumes, even on new all-flash hardware, will never address the insight and isolation needed to support modern virtualized workloads.”
Billingsley compared building the virtualized data architecture to baking a cake, but forgetting to add eggs. “If you don’t think about how to solve the virtualized data architecture challenge from inception, it’s impossible to go backward.”