Dabbling in Data Deduplication
Data deduplication is an easy concept to grasp: Don’t back up each copy of a 4-megabyte PowerPoint from every employee’s hard drive. You know, the one with the dancing bar graphs to distract you from flat sales? If that file really needs to be saved (dubious), then just save one copy, not 20, and let the backup software keep track so users can restore the dancing bar graphs in the future.
There are three reasons why organizations no matter how small should investigate a backup process that includes data deduplication: to reduce the size of the backup data, to reduce the time it takes to process the backup data and to upgrade backup processes (always a good idea).
Reducing the size of the backup data, might not seem important anymore. After all, new 2-terabyte hard drives are almost as cheap as a tank of gas these days (especially for those with trucks and SUVs). You can certainly afford a few more hard drives for the backup hardware.
But your hardware might not be able to handle the new drives, so you must buy new backup hardware too — meaning the price tag just went up (like gasoline). Or, your current backup processes might be backing up too much, meaning you’re buying new disks to hold information that shouldn’t be saved at all. Worse, you’re saving that same worthless information over and over.
You may not have the time to run a backup process that transfers several TB of data. If your servers support people around the clock, when do you have four hours to lock users out and back up? This is especially true if you back up remotely. If you take up a remote site’s Internet bandwidth during work hours, the remote office will have terrible Internet response time, and people will grumble.
You can dedupe your data before or after you complete your back up. If you use dedupe software on the same server holding the data to back up, the software can churn through, find the duplicated files or disk blocks (if your backup is more granular), and pick one instance to back up. This works great if your backup window is constrained, because only the deduplicated files will be transmitted.
External dedupe appliances can do the same thing by organizing data from multiple servers and clients before sending data to another location for redundancy (or disaster recovery). Client software for cloud backup systems runs deduplication processes before uploading data. This is called source reduplication.
Target deduplication accepts backup data as fast as possible to speed the process, and then analyzes the data later for deduplication. This requires intelligent dedupe appliances that compress the data, such as those from Quantum, which makes a 2.2TB appliance that can hold as much as 40TB of uncompressed data.
If your organization doesn’t have 40TBs of data to back up, look at it this way: Those 40TBs might include 90 days of backups. The more backups a data safety administrator has, the better he or she sleeps.
One critical step that people sometimes overlook is that their backup device destination must be absolutely rock-solid. If you have only one local copy of a file, do all you can to ensure that copy is well maintained. Use high-quality disks and redundancy to protect your data.
In 10 years, deduplication will be automatic. But today, with so many organizations reporting inadequate backup processes, upgrading to a backup system that includes data deduplication should be a priority.