Database Reference
In-Depth Information
Anatomy of a compound primary key
At this point, it's clear that there's some nuance in the compound primary key that we're
missing. Both the username column and the id column affect the order in which rows
are returned; however, while the actual ordering of username is opaque, the ordering of
id is meaningfully related to the information encoded in the id column.
In the lexicon of Cassandra, username is a partition key . A table's partition key groups
rows together into logically related bundles. In the case of our MyStatus application, each
user's timeline is a self-contained data structure, so partitioning the table by user is a sound
strategy.
Note
As a general rule, you should endeavor to only query one partition at a time for any core
data access your application does. Cassandra stores the rows in each partition together, so
queries within a partition are very efficient. Queries across multiple partitions, on the other
hand, are expensive and should be avoided.
We call the id column a clustering column . The job of a clustering column is to determ-
ine the ordering of rows within a partition. This is why we observed that within each user's
status updates, the rows were returned in a strictly ascending order by timestamp of the id .
This is a very useful property, since our application will want to display status updates
ordered by creation time.
Note
Is sorting by clustering column efficient?
Sorting any collection at read time is expensive for a non-trivial number of elements. Hap-
pily, Cassandra stores rows in clustering order, so when you retrieve them, it simply returns
them in the order they're stored in. There's no expensive sorting operation at read time.
All of the rows that share the same primary key are stored in a contiguous structure on disk.
It's within this structure that rows are sorted by their clustering column values. Because
each partition is tightly bound at the storage level, there is an upper bound on the number
of rows that can share the same partition key. In theory, this limit is about 2 billion total
column values. For instance, if you have a table with 10 data columns, your upper bound
would be 200 million rows per partition key.
Search WWH ::




Custom Search