How Cassandra Distributes Data - Learning Apache Cassandra

Database Reference

In-Depth Information

Cassandra's partitioning strategy: partition key

tokens

As application developers working with Cassandra, we never need to go through the above

calculus: when we write or read data, we can perform the query on any node in the cluster,

and Cassandra will figure out where the data lives. The federation process is entirely trans-

parent to the application.

As it turns out, Cassandra uses a strategy analogous to the naïve primary key modulus ap-

proach described above. Recall from Chapter 2 , The First Table , that Cassandra has a

TOKEN function that generates an integer value for any partition key; when we retrieve res-

ults over multiple partition keys, the rows are ordered by this token.

Distributing partition tokens

When Cassandra distributes data, it assigns each node a range of tokens; a row is stored on

the node within whose token range its partition key token falls. Since tokens are generated

using a hashing function, token values are distributed evenly across the entire range of pos-

sible values. So, as long as the number of partition keys is much bigger than the number of

nodes in the cluster, partition keys will be balanced evenly between the different nodes, if

each node is responsible for an equally sized portion of the token range.

Let's take a look at a few rows in our users table and examine how they would be distrib-

uted in a three-node cluster:

SELECT "username", TOKEN("username")

FROM "users"

WHERE "username" IN ('alice', 'bob', 'ivan');

We'll see that these three rows have primary key tokens that are pretty evenly distributed

across the 64-bit space of possible tokens:

Search WWH ::

Custom Search

Home