P-Guide, Continuous Data Protection 2010
Download
June 1, 2010 - Is now the time to add real-time data protection to your lineup? CDP is a highly promising-and often misunderstood-backup and disaster recovery option. Here's a look at how the sector is evolving, along with details on the latest generation of solutions.
Editor's note: Two years ago, we took a big-picture look at the state of the CDP market and examined the software's role as part of a comprehensive data-protection strategy. This P-guide updates and expands upon that earlier document, focusing on how CDP solutions work and detailing the evolving landscape of the CDP sector.
Executive Summary
The last decade has seen a fundamental shift in the way mid-market companies back up and protect their data. Where tape-based solutions were once the norm, a variety of disk-based technologies have become increasingly common in the last few years. One of the most promising-and misunderstood-of those solutions is Continuous Data Protection, or CDP, as its commonly known. For all of its potential, however, it's been slow to gain widespread acceptance among IT staffs. That's changing, however, as the technology has matured and been embraced by a growing range of vendors. How does CDP work? What are the differences between the types of CDP systems? How has the CDP landscape changed? This P-guide walks you through answers to all of those questions.
Introduction
CDP certainly has generated plenty of buzz in the last few years. It's not hard to see why: The technology is designed to continually backup a company's data as it is being created or modified-something that's beyond the reach of a traditional backup system. By doing so, it can practically eliminate the classic nightmare scenario of losing important data between weekly full and daily incremental backups.
Sounds good so far, right? Well, the challenge has been that CDP can be a confusing technology. For starters, there are multiple definitions floating around in the marketplace. The Storage Network Industry Association (SNIA) has defined it as a "methodology that continuously captures or tracks data modifications and stores changes independent of the primary data, enabling recovery points from any point in the past." IDC has defined it like this: "Continuous data protection, also referred to as continuous backup, pertains to products that track and save data to disk so that information can be recovered from any point in time, even minutes ago."
While those are straightforward enough, things get confusing when you delve deeper into the categories of CDP solutions on the market. In a nutshell, there are essentially two types of them: near-CDP and pure or true CDP. There are fundamental differences between different the two, but many vendors have glossed over those points in their marketing efforts, which has helped create uncertainty in the marketplace.
The software has faced other problems as well. Many of the early-stage CDP products were prohibitively expensive. Several had trouble integrating with SQL Server, Microsoft Exchange, and other applications. And there was some marketplace confusion between CDP and technologies such as array-based data snapshots and replication solutions that take data snapshots at defined intervals. The result: Plenty of IT managers were leery about adding CDP to their proven backup practices. That appears to be changing, however. The last few years have seen some big changes in the CDP space, and the technology is now beginning to mature and deliver on some of its early promises.
How CDP Works
Broadly speaking, CDP solutions use three different techniques to capture and write data changes to an in-house data center or to an offsite location via a wide-area network (WAN). The block- or byte-based data replication approach is perhaps the most common these days. It works by capturing blocks or bytes of data as they are created and examining each to see if it has been encountered before. If it is new or one of a kind, the system writes it to disk; if not, it creates a pointer to the original. In contrast, file-based CDP solutions record file-system changes-creation of new files and modification or deletion of older ones, etc. Another type, application-based CDP, works from within a specific application and can be designed as part of the application itself. In a sense, block- and file-based solutions offer more flexibility, as they're designed to support a range of different applications. But while application-based CDP is more narrowly focused by design, it can also work in a more integrated manner and replicate data without interrupting the underlying application.
Synchronous and Asynchronous Data Replication
CDP solutions also tend to use either synchronous or asynchronous data replication. Both methods establish a baseline by copying and writing data to a storage location and then updating it on a regular basis. The key difference is that synchronous replication waits until all of the data has been recorded on the backup system before proceeding with the next write. By doing so, it ensures that both sets of data are fully synchronized. In contrast, asynchronous replication allows new writes to be accepted without waiting for the backup site to finish its writes. While it tends to be faster than synchronous replication, it also carries the downside of potential data loss during the time periods when the two storage systems are not in sync.
Pure CDP vs. Near-CDP
There's a good reason why pure CDP is sometimes known as true CDP: It's a ruthlessly efficient solution that instantly backs up every data change, allowing for an infinite range of recovery options (which are also known as APIT-short for "any point in time"). Most-but not all-pure CDP systems are typically high-end products used by, for example, financial institutions that deal with massive volumes of constantly changing data. As such, they require serious investments in server capacity, Storage Area Network (SAN) disks, and high-bandwidth WAN channels. At the high end, pure CDP offerings top-dollar solutions often start in the $20,000 range and run into six figures. Vendors include EMC, NetApp, Quantum, and Exagrid.
In contrast, near-CDP solutions don't offer the same granularity that pure CDP delivers. The key difference is the APIT factor-near-CDP doesn't provide users with the ability to recover data from any point in time. It does, however, allow users to restore data to a specific point in time (PIT) through the creation of data snapshots (which the SNIA defines as "fully usable copies of a defined collection of data that contains an image of the data as it appeared at the point in time at which the copy was initiated"). Near-CDP solutions are also typically far less expensive and require considerably less disk space than pure CDP. Vendors include Acronis, CA, DoubleTake, Falconstor, and Symantec, among numerous others.
Marketplace Changes
As the CDP field has evolved, the differences between pure and near CDP have been decreasing. One big reason why: Newer generations of near-CDP systems allow users to space their data snapshots as little as a minute apart; the closer together the snapshots, the less chance there is for lost data. While that's not as comprehensive as the always-on protection provided by pure CDP, it's an acceptable level for many users-particularly when you consider that it offers far more protection than a daily backup alone.
The evolution of near-CDP has also driven new developments in the CDP market. One of the most important is that numerous vendors now integrate CDP directly into backup and disaster-recovery products to create a multi-tiered protection setup. Doing so allows users to manage the CDP solution through their standard system and add PIT recovery capabilities to complement or replace existing backup practices. In a similar vein, some providers also integrate CDP into unified data-protection and disaster-recovery suites that feature a mix of replication, data deduplication, and encryption technologies. And as virtualization continues to emerge, vendors also are using CDP in virtual disk environments.
CDP Solutions
CA ARCserve High Availability r15
CA ARCserve High Availability is a backup and recovery suite that features a pure CDP component for systems, applications, and data. It uses asynchronous replication to provide block-level incremental backups and bare-metal restorations (i.e., reformatting a computer after catastrophic failures). One key benefit is the solution's ability to take rapid snapshots. What's more, since it works on the block level (only taking snapshots of changed blocks), it's highly efficient in its use of disk space. The suite itself can replicate or automatically failover entire physical or virtual servers, features integrated virus protection, and is compatible with Linux, Unix, and Windows. While ARCserve High Availability uses either LAN or WAN connections to transmit backups, it also features an Offline Synchronization mode, which allows users to run a snapshot from the master server to removable media and then synchronize the replica server without using either a LAN or WAN. The master server compiles any changes while the offline synchronization process is taking place, and the changes are copied to the replica server and held until the synchronization is finished.
Double-Take Availability
Double-Take Availability is an asynchronous near-CDP solution that uses byte-based data replication and provides failover for physical and virtual servers. The software monitors changes made to protected files and replicates only the bytes that change. Available for Windows and Linux, the software works on physical and virtual servers. It also has an automated failover option, along with a feature called Open-file Mirroring that allows users to configure files and directories for replication without restarting or interrupting open applications. On a related note, the software also provides high-availability server failover, which ensures that individual users can remain online in the event of a failure. Another productivity-centric feature is a set of controls that allow administrators to define the amount and type of network bandwidth (T1, 128Kbps, etc.) that the software uses for data replication, thus helping avoid slowdowns. Finally, the software provides multiple data-compression options that can be individually configured for unique servers, data, and networks.
Acronis Backup & Recovery 10
Acronis Backup & Recovery 10 is a data protection and disaster recovery solution based on the company's True Image product line that offers advanced backup and recovery function, including data deduplication, policy-based management, and an operations dashboard, among others. It can be configured for near-CDP functionality, allowing users to synchronously write data to a secondary SAN or Network-Attached Storage drive. The near-CDP component is flexible-it has the capability to run bare-metal restores, create full disk image backups, or use snapshot technology to run incremental backups of disks. It's also elegantly simple in its setup: It uses an agent-based approach to time-stamp backup disk images, allowing users to roll back the disk or partition to any of the incremental backups. The software is designed for Windows and Linux environments (and also works for virtualized environments), and features a user-friendly GUI along with a centralized console to manage all workstation and server backup and disaster recovery activities.
Symantec Veritas Replication Exec 3.1
Veritas Replication Exec 3.1 is a pure CDP solution that helps organizations ensure continuous, around-the-clock protection of Microsoft SQL and Exchange applications. It works by copying data from an application server to a secondary standby server in real time as files are created or changed. It also only replicates the changed bytes within each data file, which allows administrators to maintain a complete record of each Exchange or SQL server that can be recovered with minimal downtime. It's also flexible-you can set up the replication to occur continuously or on a scheduled basis, such as during off-peak hours. Finally, the software features a function known as Network Outage Resiliency, which ensures that the replication process will continue right where it left off once connectivity is restored in the event of a network outage.
Conclusion
While widespread CDP adoption has yet to take place, there are good reasons why the technology is becoming a more attractive alternative for mid-market IT departments. Given the need for mid-market IT staffs to handle rapid data growth, ever-shrinking shrinking backup timeframes, and the need for 24/7 applications, however, it seems more like a question of when, rather than if, CDP will take gain widespread acceptance. Your best approach in such a scenario: Arm yourself with the information to make a smart buying decision.
For more information on CDP, contact Productive Corporation:
Phone: 1.800.726.4099
Email: help@productivecorp.com
About Productive Corporation
Productive Corporation is a specialized software reseller that helps small and medium businesses across North America with software initiatives in security, storage, and infrastructure. We provide subject matter expertise, access to technical resources, and excellent customer service. We also strive to provide the most relevant resources for our customers.
About the Author
Chris Mikko is a Twin Cities-based writer and editor who specializes in technology topics.