Database Reference
In-Depth Information
Cassandra's partitioning strategy: partition key
tokens
As application developers working with Cassandra, we never need to go through the above
calculus: when we write or read data, we can perform the query on any node in the cluster,
and Cassandra will figure out where the data lives. The federation process is entirely trans-
parent to the application.
As it turns out, Cassandra uses a strategy analogous to the naïve primary key modulus ap-
proach described above. Recall from Chapter 2 , The First Table , that Cassandra has a
TOKEN function that generates an integer value for any partition key; when we retrieve res-
ults over multiple partition keys, the rows are ordered by this token.
Distributing partition tokens
When Cassandra distributes data, it assigns each node a range of tokens; a row is stored on
the node within whose token range its partition key token falls. Since tokens are generated
using a hashing function, token values are distributed evenly across the entire range of pos-
sible values. So, as long as the number of partition keys is much bigger than the number of
nodes in the cluster, partition keys will be balanced evenly between the different nodes, if
each node is responsible for an equally sized portion of the token range.
Let's take a look at a few rows in our users table and examine how they would be distrib-
uted in a three-node cluster:
SELECT "username", TOKEN("username")
FROM "users"
WHERE "username" IN ('alice', 'bob', 'ivan');
We'll see that these three rows have primary key tokens that are pretty evenly distributed
across the 64-bit space of possible tokens:
Search WWH ::




Custom Search