Reading and Writing Data - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

Ranges and Slices

A rangebasically refers to a mathematical range, where you have a set of ordered elements and

you want to specify some subset of those elements by defining a start element and a finish ele-

ment. The range is the representation of all the elements between start and finish, inclusive.

Ranges typically refer to ranges of keys(rows). The term sliceis used to refer to a range of

columns within a row.

The range works according to the column family's comparator. That is, given columns a , b ,

c , d , and e , the range of (a,c) includes columns a , b , and c . So, if you have 1,000 columns

with names that are long integers, you can see how you could easily specify a range of columns

between 35 and 45. By using ranges, you can retrieve all of the columns in a range that you

define (called a rangeslice), or you can perform the same update to the items in the range using

a batch.

You may have many hundreds of columns defined on a row, but you might not want to retrieve

all of them in a given query. Columns are stored in sorted order, so the range query is provided

so that you can fetch columns within a range of column names.

NOTE

Range queries require using an OrderPreservingPartitioner , so that keys are returned in the order

defined by the collation used by the partitioner.

When specifying a range query and using Random Partitioner, there's really no way to specify

a range more narrow than “all”. This is obviously an expensive proposition, because you might

incur additional network operations. It can also potentially result in missed keys. That's because

it's possible that an update happening at the same time as your row scan will miss the updates

made earlier in the index than what you are currently processing.

There is another thing that can be confusing at first. When you are using Random Partitioner,

you must recall that rangequeriesirsthashthekeys. So if you are using a range of “Alice” to

“Alison”, the query will first run a hash on each of those keys and return not simply the natural

values between Alice and Alison, but rather the values between the hashes of those values.

Here is the basic flow of a read operation that looks for a specific key when using Random Par-

titioner. First, the key is hashed, and then the client connects to any node in the cluster. That

node will route the request to the node with that key. The memtable is consulted to see whether

your key is present; if it's not found, then a scan is performed on the Bloom filter for each file,

starting with the newest one. Once the key is found in the Bloom filter, it is used to consult the

corresponding datafile and find the column values.

Search WWH ::

Custom Search

Home