Connecting Technology and Business.

Data Protection in the Cloud

Primary Data Protection

Primary data is data that supports online processing. Primary data can be protected using a single technology, or by combining multiple technologies. Some common methods include the levels of RAID, multiple copies, replication, snap copies, and continuous data protection (CDP).

Primary data protection within the mass market cloud is usually left up to the user. It is rare to find the methods listed above in mass market clouds today because of the complexity and cost of these technologies. A few cloud storage solutions protect primary data by maintaining multiple copies of the data within the cloud on non-RAID-protected storage in order to keep costs down.

Primary data protection in the enterprise cloud should resemble an in-house enterprise solution. Robust technologies like snap copies and replication should be available when a business impact analysis (BIA) of the solution requires it. APIs for manipulating the environment are critical in this area so that the data protection method can be tightly coupled with the application.

The main difference between in-house enterprise solutions and storage in an enterprise cloud is how the solution is bundled. To maintain the cloud experience of deployment on demand, options must be packaged together so the service can be provisioned automatically. The result is a pick list of bundled options that typically meet a wide variety of requirements. There may not be an exact match in the frequency of snap shots, replication, and the like, for a customer's requirements. Nonetheless, most users will usually sacrifice some flexibility to realize the other benefits of operating within an enterprise cloud.


Secondary Data Protection

Secondary data consists of historical copies of primary data in the form of backups. This type of data protection is meant to mitigate data corruption, recover deleted or overwritten data, and retain data over the long-term for business or regulation requirements. Typical solutions usually include backup software and several types of storage media. Data de-duplication might also be used, but this can raise issues in a multi-tenant environment regarding the segregation of data.

There are solutions (commercial and public-domain) that can be added to mass market cloud storage offerings to accomplish secondary data protection, but it is rare for the mass market cloud providers to package this together with the online storage. Although the reasons vary, in some instances SLAs related to restore times and retention periods can be difficult to manage.

Whether the solution is a private or a multi-tenant cloud platform, control, visibility, and restore SLAs are critical for secondary data protection. Initiating a restore should be straightforward and should happen automatically once the request is submitted. Users should be able to count on some predictable level of restore performance (GBs restored / amount of time) and should be able to select the length of retention from a short pick list of options. Finally, users should also be able to check on the status of their backups online. Since frequency and retention determine the resources required for storing backups — and thus the cost — online status of usage and billing should be viewable by the consumer to avoid surprises at the end of the billing period.