Managing a Cluster – Scaling, Node Repair, and Backup - Mastering Apache Cassandra

Database Reference

In-Depth Information

• Initial tokens : This step is not needed if you are using a vnode. Depending on the

type of partitioner that you are using for key distribution, you will need to recal-

culate the initial tokens for each node in the system (refer to Chapter 4 , Deploying

a Cluster for initial token calculation). This means older nodes are going to have

different datasets than they originally owned. However, there are a couple of

smart tricks in the initial token assignment.

◦ N-folding the capacity : If you are doubling, triplicating, or increasing

the capacity N times, you'd find that the initial token generated, includes

older initial tokens. Say, for example, you had a three-node cluster with

initial tokens as 0, t/3, and 2t/3. If you decide to triple the capacity by

adding six more nodes, the new tokens should be 0, t/9, ... t/3, ... 2t/3, and

... 8t/9. The trick here is to leave the tokens that are already in use in the

existing cluster and assign the rest of the nodes with remaining tokens.

This saves extra move commands to adjust the tokens. You just launch

the new nodes and wait till data streams out to all the nodes.

◦ Rebalance later : This is the most common technique among those who

have started with Cassandra. The idea is not to bother about imbalance.

You can just launch new nodes. Cassandra will assign it with a token

value, that is, the middle value of the highest loaded node. This, as expec-

ted, does a pretty decent job in removing hotspots from the cluster (and

this is often what you want when you are adding a new node). Once the

data streaming between the nodes is done, the cluster may or may not be

perfectly balanced. You may want to load balance now. (Refer to the

Load balancing section.)

◦ Right token to the right node : This is the most complex but the most

common case. Usually, you do not go for doubling or quadrupling the

cluster. It is more likely that you are asked to add two new nodes. In this

case, you calculate the tokens for the new configuration, edit new nodes

in cassandra.yaml , and set initial tokens to them (no specific

choice). You start them and move the data around the nodes so that the

nodes comply with the new initial tokens that we calculated. (We'll see

how to do this later in this chapter.)

• Start a new node : With the initial token assigned or not assigned to the new

nodes, we should start the nodes one by one. It is recommended to have a pause

of at least 2 minutes between two nodes to start. These two minutes are to make

sure that the other nodes know about this new node via gossip.

• Move data : This step is not needed if you are using a vnode. If adding a new

node has skewed the data distributed in the cluster, we may need to move the data

around in such a way that each node has equal share of the token range. This can

Search WWH ::

Custom Search

Home