While the IC’s research organization looks into adding security to cloud environments, in the here and now, intelligence agencies are sharing more data.
Big Data is becoming an important tool for agencies, and as the data itself grows dramatically, the storage and data management solutions that agencies employ become more critical. As agencies deal with challenges such as implementing analytics and getting a handle on massive data files, they also must find the best fit among various storage options.
Some agencies with Big Data storage needs will focus on obtaining a large amount of capacity at a relatively low cost. For some applications, an important attribute of storage solutions and services are their metadata capabilities. This includes the ability to support flexible and user-defined metadata.
Another enabling capability is policy management, which can use metadata for implementing or driving functions such as how long to retain data, when and how to securely dispose of it, and where to keep it (along with application-related information). This adds some flexible structure to unstructured data without the limits or constraints associated with structured data management.
Finding the right storage medium can help an agency meet its needs. Hard disk drives (HDDs) have been a popular approach to providing a balance of performance, capacity, density and cost-effectiveness for many applications. This trend should continue as agencies retain more data for longer periods.
Big Data can also benefit from today’s solid-state drive solutions that use dynamic random-access memory or NAND flash memory — or a combination of both — to support bandwidth needs. SSDs can also be used to store metadata and other frequently accessed items.
Tape continues to play a number of roles in Big Data. These include transporting large amounts of data in a timely manner and providing an archive or master gold backup of data kept on a disk.
Deduplication is not always an effective technique for maximizing Big Data storage capacity. Agencies should consider different tools, technologies and techniques to lessen the impact of storing and protecting their ever-growing data sets.
For example, a Big Data project could use archiving or automated tiering to migrate some data to a slower or lower-cost tier of storage, such as tape, that resides online, near-line or offline.
Another option to reduce the data footprint is to rethink how, when, where and why data is protected. Another technique for reducing the data footprint is compression (real-time or time-deferred) that can leverage different algorithms to reduce storage demands.
Protecting Big Data requires basic reliability, availability and serviceability — capabilities such as redundant power, cooling, controllers, nodes and interfaces. Agencies also should ensure data integrity and durability by conducting background data scrubs to detect parity or protection errors and bit-rot, among other inconsistencies. These background checks should be transparent to normal running operations and should correct inconsistencies before they expand into problems.
Agencies also should revisit RAID levels to optimize their Big Data storage solution. Factors to consider include how many drives are in a RAID pool or group, and chunk or I/O size, as well as the sizes and types of devices being used, which may be optimized for smaller amounts of data.
Some Big Data solutions used for analytics employ clusters or grids of industry-standard x86 or ia64 servers with internal or dedicated storage, along with application software.
Big Data applications can also leverage existing storage systems that are optimized for different uses. Some storage systems intended for traditional high-performance computing can be a good fit for bandwidth-intensive concurrent or parallel access applications using block or file access methods.
Storage solutions with object access (including HTTP, XML and cloud data management interface) are also an option for Big Data storage needs such as video, audio, image, surveillance, seismic or geographic data, among other applications with large files or items to store. Object storage systems support variable sizes and different types of data, ranging from kilobytes to gigabytes.
General Big Data storage tips:
The many different facets of Big Data applications have various storage requirements. Knowing an agency’s needs and options can support data growth while minimizing budget growth.