Database Reference
In-Depth Information
Backup and restoration
Cassandra provides a simple backup tool called nodetool snapshot to take incre-
mental snapshots and back up of data. The snapshot command flushes MemTables to
the disk and creates a backup by creating a hard link to SSTables (SSTables are immut-
able).
Note
Hard link is a directory entry associated with file data on a filesystem. It can roughly be as-
sumed as an alias to a file that refers to the location where data is stored. It is unlike a soft
link that just aliases filenames, not the actual underlying data.
These hard links stay under the data directory, which is placed under <keyspace>/
<column_family>/snapshots .
The general plan to back up a cluster roughly follows these steps:
1. Take a snapshot of each node one by one. The snapshot command provides an
option to specify whether to back up the entire keyspace or just the selected
column families.
2. Taking a snapshot is just half of the story. To be able to restore the database at a
later point, you need to move these snapshots to a location that cannot be affected
by the node's hardware failure or the node's unavailability. One of the easiest
things to do is to move the data to a network-attached storage. To AWS users, it is
fairly common to back up the snapshots in the S3 bucket.
3. Once you are done with backing up the snapshots, you need to clean them. The
nodetool clearsnapshot command cleans all the snapshots on a node.
It is important to understand that creating snapshots creates hard links to the data files.
These data files do not get deleted when they become obsolete because they are saved for
backup. This unnecessary disk space usage can be avoided by clearsnapshot after the
snapshots are copied to a different location.
For really large datasets, it may be hard to back up the entire keyspace on a daily basis.
Plus, it is expensive to transfer large data over a network to move the snapshots to a safe
location. You can take a snapshot at first and copy it to a safe location. Once this is done,
all we need to do is move the incremental data. This is called incremental backup . To en-
Search WWH ::




Custom Search