Posted by: Phil Robinson in storage, networking, Data Protection, data, cloud computing, backups on Feb 14, 2011
This article first appeared on Information Week, and was written by George Crump.
In our last entry we discussed different ways that you can move data into the cloud, something I call onramps. In theory the ability now exists to put all your data types on a cloud storage platform, but is that the right choice for your business? How do you determine which data you should put in the cloud?
The answer, like almost everything else in I.T., is it depends. It depends on what your key internal storage challenges are and what the internal resistance to using an external service might be. Notice that not included in that discussion is what is the size of your company, the amount of IT resources you have nor the amount of data that you have. While I find that it is often assumed that cloud storage is for small business owners only, there are cloud storage solutions for businesses of all sizes including large enterprises.
The first area to examine is how much data is being accessed on a moment by moment basis. As you may have noticed from the discussion in our last entry there is an onramp or cloud gateway for almost every data type now, ranging from backups to primary block storage. The moment by moment change rate plus the data type will determine how large the local gateway cache will need to be and how often data will need to be recalled from the cloud. The total size of the data set is for the most part irrelevant, other than the GB cost to store it but that cost should be relatively static. The movement of data from your local cache from the cloud will be what delays an application. The more often that data can be served from local cache either through smart caching algorithms or large cache space the better. Also several cloud storage providers charge extra for the transfer out of the cloud back to local storage, so it can lead to a surprise on your bill. Since most onramps or gateways give you a choice of provider it makes sense to know what the hidden extras are from each provider.
The impact of restoring data back from the cloud and its potential extra costs is one of the reasons that backup and archive data have been so popular. The transfer is almost always one way; upload. Also most big recoveries can happen from the local cache and don’t need the data stored on the cloud. The backup copy in the cloud mostly serves as a long term retention area. As you move into using cloud storage for primary data the transfer issues become a bit more thorny. The easiest data set use case to deal with is the file share use case. Most files on a file server are only active for a few days and then become dormant. This is an ideal use case for cloud storage, let the older files migrate to the cloud. Even if they do need to be recalled from cloud storage later only a single user is typically impacted by the delay in access, and a single file access is relatively fast.
Databases become a bit more tricky. Here look for applications that have a small portion of the application that is accessed on a regular basis. Microsoft SharePoint is a good example of a “ready for cloud now” data set and potentially some mail systems that store attachments and messages as discrete files. In the near future don’t rule out busy transaction oriented databases. As the developers of these platforms embrace the availability of cloud storage they can build in ways to auto-segment off tier sections of data so that it can be stored on different storage types automatically and the cloud could be one of those types.