Like in the real estate market, the value of data is determined in large part by location.
You want to store data as close to its users as possible. That may be on-premises, if you have a centralized organization, or in your branch offices, retail outlets, factories or labs.
But if your applications run in the cloud, you may want your data to be in the cloud as well. And if your data is on tape, it may not be as accessible as you think.
Local compliance rules and regulations – think GDPR – may stop you from going to the cloud with at least some of your data. Regulations require that you need to know where your data is, and you need to be able to find and delete “personal data” (and the copies and the backups) from all of your systems, as GDPR mandates that people have the “right to be forgotten.”
As long as the major cloud providers have not solved the compliance and “where is my data?” issues, you will probably need to store specific data locally, behind your own firewall. Or you may want to work with a trusted local service provider that guarantees your data will stay in your own region for compliance reasons, stored according to your organization’s requirements, rather than storing your sensitive data on an unprotected S3 server.
Whatever option you choose, the location of your data has a huge impact on cost, speed and durability. To find the best location for your data, you need to know what that data is, who will use it, and what will access it. To make an educated decision, start by understanding whether the data is hot, warm or cold.
         
Hot, warm and cold data
Cold data is typically old data, or data that has not been requested for a longer period. Think of most of your office files written in Word, Excel, and PowerPoint. Do they really need to be in your office and on your employees’ laptops and on back-ups? Most of your data is cold – 60-70% of the data that is stored is only read once.
This cold data consumes expensive primary storage. Tape storage or the cloud may be a better place for your archived, untouched data.
Warm data – the data you use more than once, or the data you need for research and analysis – is something else. The cloud may impose performance limitations on your data, or regulations may prohibit usage of the cloud. You should also take the transportation cost into consideration when getting data back from the cloud.
If you use applications that live in the cloud, the cloud may be the best place to store your data. But if you run your applications with your local provider or in your own data center, you will likely want your data to be there.
Hot data is used intensively: the latest report or video, production numbers, transactions, databases. This data has its own specifications (block storage) and requires performance and speed, most likely Flash or SSD technology. That is exactly why you invested in these expensive machines. But along the way you and many others have been adding less valuable data that decrease performance and increase cost.
GPS to track your data
Is does not matter if the data is hot, cold or warm, you still need to know where that data is, and protect it.
Enter object storage. Compared to traditional storage, object storage has a lot of common sense already built in. All data is encrypted, at rest and in motion. The system distributes data over several nodes, in different locations, based on your requirements. And object storage is built to store unstructured data, which is 80% of the data stored by enterprises today.
Cloudian’s object storage platform HyperStore has a “data GPS” that shows you where your data is, down to which disk it’s on, in which server, and in which rack. This “data GPS” provides a partial solution to one important part of cyber-security and data protection: knowing where your data is at all times.
So, where is your data heading?
And why should you care?
As mentioned before, GDPR requires that you “know where your data is,” which becomes tricky when you’re dealing with cloud systems whose very design encourages you not to care about exactly where your data resides. If you care that your data is on a specific server, then that server can’t be quickly and easily replaced as a commodity part. Disks die all the time, and servers are constantly upgraded to newer, faster models.
But does it really matter where in a datacenter the server is?
If the server stops being in the datacenter because it’s been stolen, yes, you do care. Because data isn’t physical, “moving” it requires copying it first, then deleting the copy you don’t need. You need some level of assurance that the deletion actually happened.
Object storage allows for hyper-converged access. Your customers, colleagues and partners can get to their data, independent of location. You can make temporary copies to any node in the network at any time, so your data is always close to the user, application or machine that needs it. But with that ability to put the data where it needs to be comes control – you always know where the data is stored physically.
Which all comes back down to control. Do you have control over where your data gets stored, who can access it, copy it, change it? In today’s world – where both data accessibility and compliance with regulation are mission-critical – location is everything.
To learn more about Cloudian HyperStore, visit the HyperStore page.