Database Reference
In-Depth Information
Partitioners
By assigning initial tokens, we have created buckets of keys. What factor determines which
key goes to what bucket? It is the partitioner. A partitioner generates a hash value of the
row key. On the basis of the hash value, Cassandra determines to which bucket (node) this
row needs to go. This is a good way in which hash will always generate a unique number
for a row key. Therefore, this approach is also used to determine which node to read from.
Like everything else in Cassandra, a partitioner is a pluggable interface. You can imple-
ment your own partitioner by implementing
org.apache.cassandra.dht.IPartitioner and dropping the .class or
.jar file in Cassandra's lib directory.
Here is how you insert the preference for a partitioner in cassandra.yaml :
partitioner: org.apache.cassandra.dht.RandomPartitioner
In most cases, the default partitioner is generally good for you. It distributes keys evenly.
As of version 2.1.0, the default is Murmur3Partitioner , but versions under 1.2 have
RandomPartitioner . The Murmur3Partitioner is faster and slightly more effi-
cient than the other.
Be warned that it is a pretty critical decision to choose a partitioner because this determines
what data stays where. It affects the SSTable structure. If you decide to change it, you need
to clean the data directory. Thus, the decision made for the partitioner at the start of the
cluster is likely to stay for the lifetime of a cluster.
Cassandra provides three partitioners by default.
Note
Actually, there are five partitioners. But two are deprecated, so we will not be discussing
them here. It is not recommended to use them. They are OrderPreservingParti-
tioner and CollatingOrderPreservingPartitioner .
The Random partitioner
A Random partitioner is the default partitioner before version 1.2. It uses MD5 hash to gen-
erate hash values for row keys. Since hashes are not generated in any orderly manner, it
Search WWH ::




Custom Search