Managing a Cluster – Scaling, Node Repair, and Backup - Mastering Apache Cassandra

Database Reference

In-Depth Information

Chapter 6. Managing a Cluster - Scaling,

Node Repair, and Backup

As a system grows, an application matures, the cloud infrastructure starts to warn us about

the failure of underlying hardware, or if you get hit by the TechCrunch effect, you may

need to do one of these things: repair, backup, or scale up/down. Alternatively, the manage-

ment might decide to have another data center setup just for the analysis of data (maybe us-

ing Hadoop) without affecting the user's experience for which the data is served from the

existing data center. These tasks are an integral part of a system administrator's day job.

Fortunately, all these tasks are fairly easy in Cassandra, and there is a lot of documentation

available for it.

In this chapter, we will go through Cassandra's built-in DevOps tool and discuss how to

scale a cluster up and shrink it down. We will also see how one can replace a dead node or

just remove it, and let other nodes bear the extra load. Further, we will briefly see backup

and restoration of Cassandra data. We will also observe how a virtual node takes away the

burden of manually rebalancing the cluster, which used to be a source of headache in the

versions before Cassandra Version 1.2. You will still have to balance nodes if you decide to

not use virtual node.

Most of the tasks are mechanical and really simple to automate. It may be a burden to

maintain a large cluster of nodes if you have to do everything by hand and you will make a

mistake.

Note

Note that the default installation of Cassandra Version 1.2 onwards uses virtual nodes, but

as of Cassandra 2.1.x, one may opt out of vnode (which is not a good idea). A major part of

this chapter deals with initial_token , load balancing, token distribution, and token

generation. All of these things are not required if you are using vnodes.

Search WWH ::

Custom Search

Home