Making Dedupe FasterBy Drew Robb
October 6, 2010
With its Avamar and Data Domain dedupe products, EMC is in the right place at the right time to take advantage of the growing trend toward disk-based data protection, said David Hill, an analyst at Mesabi Group
These days, just about every big storage company has gotten into the dedupe game Quantum, IBM, Symantec and others.
Understandably, these companies have been jockeying for position in terms of speed, data compression ratios, required bandwidth and so on. So it comes as no surprise that EMC would be investing heavily in R&D; to make its deduplication products even better.
The latest upgrade to Data Domain deduplication is known as Boost. It moves the bulk of the work from the Data Domain target device upstream to the backup media server. This actually moves the product closer to the functionality of Avamar, which conducts deduplication at the client side before it is backed up.
Boost reduces the load on the backup server and frees up even more bandwidth for the Data Domain appliance, said Frank Slootman, president of EMCs backup and recovery systems division. It acts like a turbocharger for inline deduplication.
Those installing EMC NetWorker backup software will soon find Boost already included as a free addition. What that means is that when they install NetWorker, their Data Domain box will recognize it.
Further, IT administrators can log onto NetWorker and control deduplication and replication directly from the NetWorker console.
We are serious about integration, said Slootman. NetWorker can do autodiscovery, auto-configuration and more. There are few reasons for NetWorker users to log into a Data Domain appliance now.
Those currently using Data Domain can get Boost as a free plug-in for backup software such as NetBackup and Backup Exec.
Boost brings about a 20 to 40 percent drop in backup load due to fewer copy requests, said Slootman.
Dedupe Done DifferentlyPreviously, all data was sent over the network by the backup sever to the Data Domain device. Software within the Data Domain box recognized data that had been backed up before and discarded it. It only retained new files or revised documents.
This placed an unnecessary strain on the network. Traditionally, backup servers use standard protocols such as NAS (NFS/CIFS) or tape library emulation to interact with storage devices. But protocols such as NFS or CIFS were designed for network traffic, not backup. So they arent the most efficient way of going about it.
With Boost, a different approach is taken. The backup server is able to determine what needs to be forwarded and only transmits that information. According to Slootman, that means 80 to 99 percent less backup LAN bandwidth, and a 50 percent faster Data Domain box. This is accomplished via a software client that transfers the compute-intensive work from the deduplication appliance to the backup server.
Ironically, this all came about due to NetApp and Symantec developing Open Storage Technology (OST). This was developed as an application programming interface (API) for Symantecs NetBackup and BackupExec products to allow that interface with a wide range of storage devices.
Data Domain began to take advantage of OST by working with Symantec to integrate with NetBackup. The reasoning was simple. Almost a quarter of the Data Domain installed base uses that Symantec product.
Boost isnt quite the same as OST, but it is based on it. It extends OSTs benefits beyond Symantec backup products, while making it faster and more efficient.
Slootman gave an example of the benefits on a Data Domain DD880 device. In this case, Boost improved the throughput from 5.5 TB per hour to 8 TB. Dedupe changes everything operationally and cost wise, said Slootman.
So much so that deduplication has taken a lot of business away from tape. In fact, Data Domain made a name for itself with its Tape Sucks marketing campaign. As overall tape sales continue to fall, and Avamar and Data Domain are growing extremely rapidly (100% year over year), it brings to mind a process known as creative destruction (a term popularized by the late Austrian economist Joseph Schumpeter) where one product replaces another, said Hill.
What does this mean to EMCs bottom line? EMC chairman Joe Tucci regularly boasts about deduplication income from Data Domain and Avamar in his earnings calls. Meanwhile Slootman predicts that Data Domain might beat $1 billion in sales this year Demand is huge, said Slootman.