Dedupe Evolves, EMC-StyleBy Drew Robb
February 15, 2011
Debate followed about the best means of accomplishing deduplication: at the source or the target? Inline or offline? The bottom line appears to be that most approaches hold merit the best approach is to use those methods that suit a specific application.
Next up was smarter design of dedupe software to take traffic off the network. EMC, for example, introduced Boost software, which cuts down on bandwidth usage during backup. Yet despite these innovations, disk-based often struggled to live up to the hype of dispensing with the need for tape.
The popular view has been that disk should be used for a few months of backup (up to a year or two) while tape should be harnessed for archiving. Most recently, we have seen the introduction of the EMC Data Domain Archiver, which sees dedupe now attempting to take over the achiving space.
The high end of the EMC Data Domain line just massively scaled in both performance and capacity, said David Chapa, an analyst at Enterprise Strategy Group. Inline data deduplication storage systems can now scale to meet the needs of the largest enterprises.
EMC Makes Big MovesLets take a look at where deduplication is going by using the example of the progression of the technology at Data Domain, which was acquired by EMC in 2009.
Some technological breakthroughs struggle to sell their worth and take years to gain ground. Not so with deduplication. Its value proposition is a no-brainer: Instead of keeping hundreds of backup tapes most of which contain the same data backed up over and over again only back up those files that are new or have been changed. The result is reductions in backup traffic ranging from seven to one to 200 to one.
Trucking company Pitt Ohio Express, for example, has reduced its daily backup window by 75 percent and taken restore times down from several days to five minutes since implementing a couple of Data Domain units.
We are achieving data compression ratios of 200 to one for Virtual Machines and databases, said Jules Thomas, senior systems administrator at trucking company Pitt Ohio.
Since EMC took over Data Domain almost two years ago it has increased the amount of storage available within the deduplication appliances. These days, Data Domain offers:
The DD140 for remote offices it provides up to 490 GB/hour of aggregate throughput, and 43 TB of logical storage.
The DD600 Series with up to 5.4 TB/hour of aggregate throughput and 2.7 PB of logical storage.
The DD800 Series with Up to 14.7 TB/hour of aggregate throughput and 14.2 PB of logical storage.
Of course, there is some creative accounting in those numbers. You dont actually get 14 PB of disk inside the DD800 boxes. By adding expansion shelves, you can boost the total to close to 400 TB with the contention being that this is the equivalent of 14 PB of non-deduped data when you take into account average compression rates.
Regardless of the quibbles, though, the trend has been for more disk and greater throughput inside these machines over the last couple of years. The DD890, for instance, supports replication fan-in from other Data Domain systems installed at up to 180 remote offices.
To this expansion in capacity and throughput has been added a couple of other twists.
Boost software reduces most of the work that has to be done at the Data Domain target device by moving it upstream to the backup media server. Instead of sending everything over the network during a backup (and having the appliance sort out which data is already recorded and only retain the new stuff) the decision occurs at the client side. So it occurs before anything is transmitted to the Data Domain box. This cuts down considerably on network traffic.