Request a Demo
Join a 30 minute demo with a Cloudian expert.
Splunk provides big data solutions for cloud, on-premises, and hybrid environments. Splunk management capabilities include data collection, querying, indexing, and visualization. To help you prioritize data backup, Splunk architecture categorizes data according to lifecycle stages. The result is a system that includes hot, warm, cold, and frozen buckets.
To properly protect your data, there are two primary backup strategies. You can backup Splunk index data to an on-premises storage device using the Splunk data lifecycle stages, or you can use the SmartStore indexer to backup data to cloud storage such as Amazon S3, or local S3-compatible storage devices.
In this article you will learn:
This is part of a series of articles about Splunk Architecture.
Splunk is a highly scalable distributed system that indexes and searches log files. It can collect huge volumes of log data, parse it, and analyze it to provide operational intelligence. Splunk’s main benefit is that it does not require external databases or data management, it uses its own indexes and distributed storage clusters and can handle any scale of log data.
Like any enterprise system, Splunk must be supported by a data backup plan. However, you will probably not need to backup all Splunk data, because much of it may have low value. For this reason, Splunk provides a system for transitioning data between four types of storage buckets, representing different stages in the data lifecycle.
Splunk indexed data is located in database directories, divided into subdirectories called buckets. As time goes by, Splunk performs storage tiering, moving data through several types of buckets, which represent four tiers—hot, warm, cold and frozen.
Here is the simplified process, but note that Splunk allows you to customize almost every aspect of the data lifecycle, so your process may be different:
To summarize, your backup strategy should primarily consider warm buckets. Cold buckets may also be backed up in some circumstances, but you should never back up hot or frozen buckets.
There are two ways to perform a backup of a Splunk bucket:
Incremental backups
Splunk recommends scheduling regular backups of any new warm buckets, using a third party incremental backup utility. If your policy specifies that hot buckets should be frequently rolled to warm buckets, include the colddb directory in your backup schedule, to ensure you don’t miss any buckets that recently rolled from warm to cold.
If there is a need to back up hot buckets, take a snapshot of their files, using a tool like Windows VSS or ZFS snapshots. You can also manually roll a hot bucket to a warm bucket and then configure the backup, but this is not recommended.
Backup all data
Splunk strongly recommends backing up all data when the indexer is updated—including hot, warm, and cold buckets. There are a number of approaches to do so, based on the size of your dataset and how much downtime is reasonable for your Splunk deployment.
If you already have incremental backups of warm buckets, you only need to worry about hot buckets when you perform an indexer update.
Estimate your current Splunk Storage costs using our Splunk Storage Calculator.
Follow these best practices to safely back up your Slunk deployment.
Backup Splunk configuration files
It is important to ensure you have regular backups of your Splunk configuration files, including saved searches, user accounts, tags, and custom sources. Splunk data buckets will not be useful without your custom configuration. Configuration files are stored in the SPLUNK_HOME/etc/ directory and its subdirectories.
Backup to a remote location
To ensure durability in case of complete site failure, copy configuration to a remote location. If this is not possible, at least backup files to a different part of your data center, to a different machine or a different physical disk, to reduce single points of failure.
Backup single points of failure
If your Splunk deployment has one indexer, one search head, or critical utility resources like a deployment server, license server or master node, ensure they are backed up. Test your restore procedure to ensure you can quickly restore the system in case of disaster.
Backup at least one search head cluster (SHC)
Periodically back up the state of your SHC, to ensure you can re-establish knowledge items in their current state in case of disaster.
Use version control
It’s extremely important to be able to save multiple versions of your configuration and other data, so you can roll back to a specific previous version if needed. There are three ways to do this:
Splunk SmartStore is an indexer capability that lets you use remote object stores to store indexed data. This includes Amazon S3, other cloud services that support the S3 API, and on-premise S3-compatible storage devices like Cloudian’s private cloud storage.
SmartStore has several advantages compared to traditional on-premises Splunk backup, as described above:
The SmartStore indexer is especially useful for Enterprise Splunk operations that go through massive amounts of data. Splunk’s SmartStore provides enterprises with more control and scaling options.
Splunk SmartStore and Cloudian HyperStore create an on-prem storage pool, which is separate from Splunk indexers, and is scalable for huge data stores that reach exabytes. Here’s what you get when you combine Splunk and Cloudian:
You can find more information about the cooperation between Splunk and Cloudian here.