Database Reference
In-Depth Information
Initial tokens : This step is not needed if you are using a vnode. Depending on the
type of partitioner that you are using for key distribution, you will need to recal-
culate the initial tokens for each node in the system (refer to Chapter 4 , Deploying
a Cluster for initial token calculation). This means older nodes are going to have
different datasets than they originally owned. However, there are a couple of
smart tricks in the initial token assignment.
N-folding the capacity : If you are doubling, triplicating, or increasing
the capacity N times, you'd find that the initial token generated, includes
older initial tokens. Say, for example, you had a three-node cluster with
initial tokens as 0, t/3, and 2t/3. If you decide to triple the capacity by
adding six more nodes, the new tokens should be 0, t/9, ... t/3, ... 2t/3, and
... 8t/9. The trick here is to leave the tokens that are already in use in the
existing cluster and assign the rest of the nodes with remaining tokens.
This saves extra move commands to adjust the tokens. You just launch
the new nodes and wait till data streams out to all the nodes.
Rebalance later : This is the most common technique among those who
have started with Cassandra. The idea is not to bother about imbalance.
You can just launch new nodes. Cassandra will assign it with a token
value, that is, the middle value of the highest loaded node. This, as expec-
ted, does a pretty decent job in removing hotspots from the cluster (and
this is often what you want when you are adding a new node). Once the
data streaming between the nodes is done, the cluster may or may not be
perfectly balanced. You may want to load balance now. (Refer to the
Load balancing section.)
Right token to the right node : This is the most complex but the most
common case. Usually, you do not go for doubling or quadrupling the
cluster. It is more likely that you are asked to add two new nodes. In this
case, you calculate the tokens for the new configuration, edit new nodes
in cassandra.yaml , and set initial tokens to them (no specific
choice). You start them and move the data around the nodes so that the
nodes comply with the new initial tokens that we calculated. (We'll see
how to do this later in this chapter.)
Start a new node : With the initial token assigned or not assigned to the new
nodes, we should start the nodes one by one. It is recommended to have a pause
of at least 2 minutes between two nodes to start. These two minutes are to make
sure that the other nodes know about this new node via gossip.
Move data : This step is not needed if you are using a vnode. If adding a new
node has skewed the data distributed in the cluster, we may need to move the data
around in such a way that each node has equal share of the token range. This can
Search WWH ::




Custom Search