Database Reference
In-Depth Information
For a single data center setup, initial tokens make up an evenly divided token range as-
signed to various nodes.
Multiple data center setups
Here is the issue with multiple data center setups. Suppose you have two data centers with
each having three nodes in it; then, here is how the keyspace looks:
CREATE KEYSPACE my_keyspace
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'DC1' : 3,
'DC2' : 2
};
It says that there are at least six nodes in the ring. Keep three copies of each row in DC1
and three more copies in DC2 . In Cassandra 1.2 and later versions, Cassandra will distrib-
ute the data almost evenly and make sure the specified number of replicas live in a given
data center. If you are using the pre-1.2 version of Cassandra or have decided not to use
the vnode cluster, you should know how to assign tokens to make sure both the data cen-
ters are equally loaded. (If you are using vnodes, you can safely skip to the next section.)
Assume the system actually has four nodes in each data center, and you calculated the ini-
tial token by dividing the possible token range into eight equidistant values. If you assign
the first four tokens to four nodes in DC1 and the rest to the nodes in DC2 , you will end up
having a lopsided data distribution.
Let's take an example. Say, we have a partitioner that generates tokens from 0 to 200. If
token distribution is carried out in the way previously mentioned, the resulting ring will
look like what is shown in the following figure. Since the replication factor is bound by
the data center, all the data from 25 to 150 will go to one single node in Data Center 1,
while other nodes in the data center will owe relatively smaller number of keys. The same
happens to Data Center 2, which has one overloaded node.
This creates a need for a mechanism that balances nodes within each data center. The first
option is to divide the partitioner range by the number of nodes in each data center and as-
sign the values to nodes in data centers. However, it wouldn't work because no two nodes
can have the same token. The following figure shows multiple data centers—even key
distribution causing lopsided nodes:
Search WWH ::




Custom Search