Database Reference
In-Depth Information
Partition keys group data on the same node
In Chapter 3 , Organizing Related Data , you learned that tables with compound primary
keys store all rows sharing the same partition key in contiguous physical storage. This
leads to the observation that querying for ranges of clustering column values within a
single partition key is highly efficient. To perform this sort of lookup, Cassandra need
only locate the beginning of the range on disk, and can then read all the results beginning
at that location. Conversely, querying for rows spanning multiple partition keys requires
an inefficient random disk scan for each partition key being queried.
You new understanding of data partitioning expands this observation: you now know that
querying for multiple partition keys not only requires Cassandra to make multiple disk
scans, but very likely will also require retrieving data from multiple nodes and collating
the results. Cassandra is entirely capable of performing this operation—the process of
reading from multiple nodes and collating the results is performed by a coordinator node
and is entirely transparent to the application. But it's important to remember that the pro-
cess of reading data from multiple partitions—and thus possibly multiple nodes—is ex-
pensive and best avoided for performance-sensitive operations.
Virtual nodes
The model of data distribution we have developed thus far is, in fact, a simplification of
how a modern Cassandra cluster works. While versions of Cassandra prior to 1.2 did dir-
ectly map ranges of tokens onto physical nodes, Cassandra 1.2 introduced virtual nodes ,
which act as an intermediary in the mapping process.
Search WWH ::




Custom Search