Maintenance - Practical Cassandra

Database Reference

In-Depth Information

nodetool -h 10.100.1.110 setstreamthroughput 16

&&

nodetool -h 10.100.2.120 setstreamthroughput 16

Backup and Restore

Making backups in Cassandra is a little tricky. The first thing to keep in mind is

that in a distributed system, there is likely more than just a lot of data; there are

a lot of machines on which the data resides. So whatever you choose as a storage

medium for backups, ensure that there is plenty of space.

Are Backups Necessary?

There is some debate as to whether or not backups in a large enough distributed

system are even necessary. While it is good practice to make regular backups, it

may not be a requirement for your system. And if backups are not a major require-

ment, the overall complexity and storage requirements for your architecture can be

drastically reduced.

There are certain situations where you can get away with not having a backup.

But as with any major decision, there are trade-offs. As a reminder, if you have

a replication factor of 3, that means you have a copy of data on a total of three

separate nodes. In Amazon Web Services terminology, if two of those nodes are

in separate availability zones (us-east-1a and us-east-1b) and the third node is in

a different region (us-west-1a), the likelihood of all three nodes in that replica set

failing is rather low. But since there is still a chance, it is a decision that you have

to make based on the data requirements.

Snapshots

The major risk you take with no backup is data problems. And in systems that

are bleeding edge and still in heavy development such as Cassandra, there is al-

ways a possibility of problems. So let's assume that with your architecture and

data set size (or whatever your reasons are) backups are a requirement. In Cas-

sandra, backups are done using snapshots.

When Cassandra data is stored on disk, there are many SSTables per Colum-

nFamily and many files per table. And that is just on a single node containing a

subset of the data. In order to simplify the backup process, the concept of snap-

shots was created. The purpose of a snapshot is to make a copy of some or all of

the data on a node. After the snapshot is created, it can be easily copied or removed

Search WWH ::

Custom Search

Home