Database Reference
In-Depth Information
26 nodes, a partitioner such as ByteOrderedPartitioner and each node is respons-
ible for one letter. Therefore, the first node is responsible for all the keys starting with A,
the second for B, and so on. A column family that uses the usernames as row keys will
have uneven data distribution across the ring. The data distribution will be skewed with
nodes X, Q, and Z being very light, and nodes A and S being heavily loaded.
This is bad for multiple reasons, with the most important one being the generation of a
hotspot. The nodes with more data will be accessed more than the ones with less data. The
overall performance of a cluster may be dropped down to the number of requests that a
couple of highly loaded nodes can serve.
The best way to assign the initial token to a cluster using ByteOrderedPartitioner
is to sample data and determine what keys are the best to assign as initial tokens to ensure
an equally balanced cluster.
Let's take a hypothetical case where your keys of all keyspaces can be represented by five-
character strings from "00000" to "zzzzz" . Here is how we generate initial tokens in
Python:
>>> start = int("00000".encode('hex'), 16)
>>> end = int("zzzzz".encode('hex'), 16)
>>> range = end - start
>>> nodes = 8
>>> print "\n".join([ "Node #" + str(i+1) + ": %032x" %
(start + range*i/nodes) for i in xrange(nodes) ])
Node #1: 00000000000000000000003030303030
Node #2: 00000000000000000000003979797979
Node #3: 000000000000000000000042c2c2c2c2
Node #4: 00000000000000000000004c0c0c0c0b
Node #5: 00000000000000000000005555555555
Node #6: 00000000000000000000005e9e9e9e9e
Node #7: 000000000000000000000067e7e7e7e7
Node #8: 00000000000000000000007131313130
Remember, this is just an example. In a real case, you will decide this only after evaluat-
ing the data, or if you probably want to have initial tokens assigned by UUIDs.
Search WWH ::




Custom Search