Database Reference
In-Depth Information
Ranges and Slices
A rangebasically refers to a mathematical range, where you have a set of ordered elements and
you want to specify some subset of those elements by defining a start element and a finish ele-
ment. The range is the representation of all the elements between start and finish, inclusive.
Ranges typically refer to ranges of keys(rows). The term sliceis used to refer to a range of
columns within a row.
The range works according to the column family's comparator. That is, given columns a , b ,
c , d , and e , the range of (a,c) includes columns a , b , and c . So, if you have 1,000 columns
with names that are long integers, you can see how you could easily specify a range of columns
between 35 and 45. By using ranges, you can retrieve all of the columns in a range that you
define (called a rangeslice), or you can perform the same update to the items in the range using
a batch.
You may have many hundreds of columns defined on a row, but you might not want to retrieve
all of them in a given query. Columns are stored in sorted order, so the range query is provided
so that you can fetch columns within a range of column names.
NOTE
Range queries require using an OrderPreservingPartitioner , so that keys are returned in the order
defined by the collation used by the partitioner.
When specifying a range query and using Random Partitioner, there's really no way to specify
a range more narrow than “all”. This is obviously an expensive proposition, because you might
incur additional network operations. It can also potentially result in missed keys. That's because
it's possible that an update happening at the same time as your row scan will miss the updates
made earlier in the index than what you are currently processing.
There is another thing that can be confusing at first. When you are using Random Partitioner,
you must recall that rangequeriesirsthashthekeys. So if you are using a range of “Alice” to
“Alison”, the query will first run a hash on each of those keys and return not simply the natural
values between Alice and Alison, but rather the values between the hashes of those values.
Here is the basic flow of a read operation that looks for a specific key when using Random Par-
titioner. First, the key is hashed, and then the client connects to any node in the cluster. That
node will route the request to the node with that key. The memtable is consulted to see whether
your key is present; if it's not found, then a scan is performed on the Bloom filter for each file,
starting with the newest one. Once the key is found in the Bloom filter, it is used to consult the
corresponding datafile and find the column values.
Search WWH ::




Custom Search