There is a lot of buzz in the IT world these days about data deduplication, but what exactly is it and why would I want it?

Let me define what deduplication is. If you have a file set and in the file set are Word documents, Excel spreadsheets, CAD drawings and text files, then there are common components in each file. Let’s use the alphabet and numbers 0 through 9, as an example. In each of these files, it is likely that you will have common characters. If in 10 of these files I have multiple instances of the letter “a”, with data deduplication, we’ll store the first “a” and then anytime we need another “a”, we’ll point to where the first one is, make a note in a table of this pointer and move on to the next character “b,” and so on. The same is true for the numbers, spaces, symbols and special characters. What this does, is reduce the size of the data stored on your disk.

We now know how, at a high level, data deduplication works. What different ways can I take advantage of it?

We can deduplicate data locally. By this, I mean that we can have a file stored on one server and local disk and deduplicate the data associated with this single instance.
Alternatively, we can deduplicate by sending data to a device that can handle input from many sources. This way, we now have the ability to deduplicate from many servers or even many sites. The more data you send to a deduplication device, the better the performance.

For example, if I only deduplicate on a single server, the number of common or repeated characters is limited to that server. Now, if I send data from 10 servers, that common occurrence increases quite a bit. Now I can reduce the total amount of disk across multiple sites. Think in ratios. With local deduplication, it would take time to achieve high levels of deduplication of say maximum 8:1, but when we send data from multiple servers/sites, we can achieve a higher rate faster, and may even get as high as 9.9:1. This means that if I need 10 Terabytes (TB) of disk for 10 TB of data before deduplication, I can now use as little as 2 TB or even less than 1 TB of disk for 10TB of data.

When you look at your projected data growth and the percentage of your IT budget that goes to servicing this increase- wouldn’t it be nice if you could stretch those dollars?

And it’s not just the disk you save on. There are costs associated with storage management: data centre space (power and cooling), operations staff, maintenance contracts and insurance, which will translate into real savings simply by storing all your data in a deduplicated format.

With some of the newer data deduplication methods out there, you can now store deduplicated data in multiple locations. One copy locally, one at a second site and another in the cloud. These days you can even store the data with two cloud providers at the same time, all the while maintaining ownership of your data and control over where it resides and who can access it.
So, if you are in a crunch when it comes to storage space, or are even restricted in growing your storage footprint, you can, with deduplication, defer costs and continue operating with a deduplication solution.

Interested in seeing how this works for yourself? Test the power of deduplication of your company’s content.