Deploying a Cluster - Mastering Apache Cassandra

Database Reference

In-Depth Information

26 nodes, a partitioner such as ByteOrderedPartitioner and each node is respons-

ible for one letter. Therefore, the first node is responsible for all the keys starting with A,

the second for B, and so on. A column family that uses the usernames as row keys will

have uneven data distribution across the ring. The data distribution will be skewed with

nodes X, Q, and Z being very light, and nodes A and S being heavily loaded.

This is bad for multiple reasons, with the most important one being the generation of a

hotspot. The nodes with more data will be accessed more than the ones with less data. The

overall performance of a cluster may be dropped down to the number of requests that a

couple of highly loaded nodes can serve.

The best way to assign the initial token to a cluster using ByteOrderedPartitioner

is to sample data and determine what keys are the best to assign as initial tokens to ensure

an equally balanced cluster.

Let's take a hypothetical case where your keys of all keyspaces can be represented by five-

character strings from "00000" to "zzzzz" . Here is how we generate initial tokens in

Python:

>>> start = int("00000".encode('hex'), 16)

>>> end = int("zzzzz".encode('hex'), 16)

>>> range = end - start

>>> nodes = 8

>>> print "\n".join([ "Node #" + str(i+1) + ": %032x" %

(start + range*i/nodes) for i in xrange(nodes) ])

Node #1: 00000000000000000000003030303030

Node #2: 00000000000000000000003979797979

Node #3: 000000000000000000000042c2c2c2c2

Node #4: 00000000000000000000004c0c0c0c0b

Node #5: 00000000000000000000005555555555

Node #6: 00000000000000000000005e9e9e9e9e

Node #7: 000000000000000000000067e7e7e7e7

Node #8: 00000000000000000000007131313130

Remember, this is just an example. In a real case, you will decide this only after evaluat-

ing the data, or if you probably want to have initial tokens assigned by UUIDs.

Search WWH ::

Custom Search

Home