Cloud 101CircleEventsBlog
Get 50% off the Cloud Infrastructure Security training bundle with code 'unlock50advantage'

What They’re Not Telling You About Global Deduplication

Published 01/29/2016

What They’re Not Telling You About Global Deduplication

By Rachel Holdgrafer, Business Content Strategist, Code42

01_18_16_global_duplication2When it comes to endpoint backup, is global deduplication a valuable differentiator?

Not if data security and recovery are your primary objectives.

Backup vendors that promote global deduplication say it minimizes the amount of data that must be stored and provides faster upload speeds. What they don’t say is how data security and recovery are sacrificed to achieve these “benefits.”

Here’s a key difference: with local deduplication, data redundancy is evaluated and removed on the endpoint before data is backed up. Files are stored in the cloud by the user and are easily located and restored to any device. With global deduplication, all data is sent to the cloud, but only one instance of a data block is stored.

They tell you: “You’ll store less data!”

It’s true that global deduplication reduces the number of files in your data store, but that’s not always a good thing. At first blush, storing less data sounds like a benefit, especially if you’re paying for endpoint backup based on data volume. But other than potential cost savings, how does storing less data actually benefit your organization?

Not as much as you think.

For most organizations, the bulk of the files removed by the global deduplication process will be unstructured data such as documents, spreadsheets and presentations—files that are not typically big to begin with—making storage savings resulting from global dedupe minimal. The files that gobble up the bulk of your data storage are those that are unlikely to be floating around in duplicate—such as databases, video and design source files, etc.

What they don’t tell you: Storing less data doesn’t actually benefit your organization. Smaller data stores benefit the solution provider. Why? Data storage costs money and endpoint backup providers pay for huge amounts of data storage and bandwidth every month. By limiting the data stored to one copy of each unique file, the solution provider can get away with storing less data for all of its customers, resulting in smaller procurement costs each month—for them.

Vendors that offer global dedupe also fail to mention that it puts an organization at risk of losing data because (essentially) all the eggs are in one basket. When one file or data block is used by many users but saved just once, (e.g., the HR handbook for a global enterprise, sales pitch decks or customer contact lists) all users will experience the same file loss or corruption if the single instance of the file is corrupted in the cloud.

They tell you: “It uploads data faster.”

First, let’s define “faster.” The question is, faster than what? Admittedly, there’s a marginal difference in upload speeds between global and local deduplication, but it’s a lot like comparing a Ferrari and a Maserati. If a Ferrari tops out at 217 miles per hour and a Maserati tops out at 185 miles per hour, clearly the Ferrari wins. It’s technically faster, but considering that the maximum legal speed on most freeways is 70-75 miles per hour, the additional speed on both vehicles is a moot point. Both cars are wickedly fast but a person is not likely to get to drive either at its top speed, so what does matter? The fact is, it doesn’t.

The same can be said about the speed “gains” achieved by utilizing global deduplication over local deduplication. Quality endpoint backup solutions will provide fast data uploads regardless of whether they use global deduplication or local deduplication. There’s a good chance that there will be no detectable difference in speed between the two methods because upload speed is limited by bandwidth. Global deduplication promoters are positioning speed as a benefit you will not experience.

What they don’t tell you: Global deduplication comes at a cost: restore speeds will be orders of magnitude slower than restoration of data that has been locally deduplicated. Here’s why: with global deduplication, all of your data is stored in one place and only one copy of a unique file is stored in the cloud regardless of how many people save a copy. Rather than store multiples of the same file, endpoint backup that utilizes global deduplication maps each user to the single stored instance. As the data store grows in size, it becomes harder for the backup solution to quickly locate and restore a file mapped to a user in the giant data set.

Imagine that the data store is like a library. Mapping is like the Dewey Decimal System, only the mapped books are stored as giant book piles rather than by topic or author. When the library is small, it’s relatively easy to scan the book spines for the Dewey Decimal numbers. However, as the library collection (that is, book piles) gets larger, finding a single book becomes more time consuming and resource intensive.

Data storage under the global deduplication framework is like the library example above. Unique files or data blocks are indexed as they come into the data store and are not grouped by user. When the data store is small, it’s relatively easy for the system to locate all of the data blocks mapped to one user when a restore is necessary. As the data store grows in size, the process of locating all of the data blocks takes longer. This slows down the restore process and forces the end user to wait at the most critical point in the process—when he or she needs to get files back in order to continue working.

The real security story: What you’re not being told about global deduplication doesn’t stop there. Two-factor encryption doesn’t mean what you think it does. Frankly, an encryption key coupled with an administrator password is NOT two-factor encryption. It’s not even two-factor authentication. It’s simply a password layered over a regular encryption key. Should someone with the encryption key compromise the password, he or she will have immediate access to all of your data.

Conclusion

Companies that deploy endpoint backup clearly care about the security of their data. They count on endpoint backup to reliably restore their data after a loss or breach. Given the vulnerabilities exposed by the global deduplication model, it is counterintuitive to sacrifice security and reliability in a backup model in favor of “benefits” that profit the seller or cannot be experienced by the buyer.

To learn more about how endpoint backup with local deduplication is a more strategic data security choice, download the ebook, Backup & Beyond.

Share this content on your favorite social network today!