Managing a Cluster – Scaling, Node Repair, and Backup - Mastering Apache Cassandra

Database Reference

In-Depth Information

Load balancing

A balanced Cassandra cluster is the one where each node owns an equal number of keys.

This means when you query nodetool status , a balanced cluster will show the same

percentage for all the nodes under the Owns or Effective Ownership columns. If the

data is not uniformly distributed between the keys, even with equal ownership you will see

some nodes are more occupied by the data than others. We use RandomPartitioner or

Murmur3Partitioner to avoid this sort of lopsided cluster.

Note

Note that this section is valid for a setup that does not use vnodes. If you are using Cas-

sandra Version 1.2 or a version after it with default settings, you can skip this section.

This section is specifically for a cluster that uses one token per Cassandra instance.

Anytime a new node is added or a node is decommissioned, the token distribution gets

skewed. Normally, one always wants Cassandra to be fairly load balanced to avoid hot-

spots. Fortunately, it is very easy to load balance. The two-step load balancing process is as

follows:

1. Calculate the initial tokens based on the partitioner that you are using. It can be

manually generated by equally dividing token range for a given partitioner among

the number of nodes.

If you are using RandomPartitioner , you can use tools/bin/token-

generator to generate tokens for you. For example, the following command

generates the tokens for two data centers; each has three nodes:

$ tools/bin/token-generator 3 3

DC #1:

Node #1: 0

Node #2: 56713727820156410577229101238628035242

Node #3: 113427455640312821154458202477256070484

DC #2:

Node #1: 169417178424467235000914166253263322299

Node #2: 55989722784154413846455963776007251813

Node #3: 112703450604310824423685065014635287055

Search WWH ::

Custom Search

Home