Small data stores, small problems. Big data stores, potentially big problems.
When it comes to locating the specific data set you need in a large pool of data, big isn't necessarily better. Finding what you need quickly can be frustrating and time-consuming when dealing with large data stores — complexities that grow exponentially when the storage architectures themselves are complicated and unwieldy.
To address these issues, many organizations are turning to storage provisioning, making storage resources available to applications and servers on demand. Although the process can be accomplished manually, it requires so many steps — from visiting the storage array and allocating individual disk drives' logical unit numbers, to mapping them back to individual servers, all while making sure the route to the source is available — that many organizations prefer automated storage provisioning.
Provisioning tools are available through many vendors, including EMC, Sun Microsystems and Hewlett-Packard, as well as niche vendors such as Red Hat and Altiris.
The ideal candidate for storage provisioning is an agency that allocates storage on a daily or even weekly basis, says Greg Schulz, a senior analyst at StorageIO Group, a storage consultant in Stillwater, Mass.
"It's not necessarily the size or type of environment, but how much time you're spending," he says. "If you're only provisioning storage once or twice a week, and it takes you 15 minutes, it's no big deal.
But if you have to provision storage 10 times per day and it takes you 10 minutes each time, that's significant."
Besides saving time, storage provisioning also can greatly improve data center management.
"If you're managing a data center, you want to make sure that the service you're delivering is responding quickly, that the application is available and that your data is 100 percent intact," Schulz says. "Storage provisioning makes sure that you're reaching your availability goals and doesn't inadvertently go offline when you provision something. It also ensures that you haven't created a performance bottleneck or done something that results in lost or corrupted data."
Although many types of organizations can benefit from storage provisioning, federal agencies, which often maintain large stores of data and complex storage infrastructures that include technology from multiple makers, are an exceptionally good fit, says R.B. Hooks, CTO for public-sector storage at Sun.
"If you're dealing with massive amounts of data, you have to look at how many people it would take to manage your data, because as you grow, that scales the number of people you'll need," he says. "Provisioning allows you to use fewer people because you've got policies that make the decisions for you."
Dealing with large quantities of data was only one reason officials at the Agriculture Department's Aerial Photography Field Office (APFO) adopted storage provisioning. The field office manages 45 terabytes of national geospatial data in two data sets — with annual growth expected of more than 15TB — through a geospatial data warehouse, housed in a hierarchical storage management system.
Until just a few years ago, the USDA office filled requests for geographic data sets from other agencies and private organizations by searching through reams of tape and stacks of CDs that housed the geospatial data. It could take from several hours to a few days, says David Nabity, geospatial data manager at the APFO in Salt Lake City. And, managing the data sets became increasingly complicated, he says.
Over the past four years, APFO officials decided to overhaul the way the organization managed and fulfilled data set requests, adding storage provisioning software on the front end and a relational database management system to house the data on the back end. Today, organizations requesting data sets have two choices: accessing it themselves through GeoSpatial One-Stop, a Web portal offering access to maps, data and other geospatial services, or submitting a request to APFO.
"These automated efforts save time and ease the management of processing and archiving large data sets, and the easy-to-use user interface has significantly reduced the research time for locating requested data," Nabity says.
Through the provisioning software, users can apply one of several search tools to identify their desired data, define provisioning parameters and produce data sets by creating, copying or providing symbolic links to the generated files. If an organization prefers, it can continue to submit requests for data sets that APFO employees then fill using the same process. Employees can process the request by partitioning the requested data to external hard drives, CDs or DVDs.
The research process, which once took several hours, now takes as little as 10 minutes, Nabity says. "With the provisioning system, you just go into an application and do an area search. Once you find your area of interest, the automation kicks in and pulls down the data you need," he says.