Effective CQL - Mastering Apache Cassandra

Database Reference

In-Depth Information

might not even live on the same machine. Here is an example of how two seemingly con-

secutive row keys have MD5 hash that is far apart:

ROW KEY | MD5 HASH VALUE

--------+----------------------------------

1234 | 81dc9bdb52d04dc20036dbd8313ed055

1235 | 9996535e07258a7bbfd8b132435c5962

Let's take an example of two partitioners: ByteOrderPartitioner that preserves

lexical ordering by bytes, and RandomPartitioner that uses an MD5 hash to gener-

ate a row key. Let's assume that we have a users_visits table with a row key,

<city>_<userId> . ByteOrderPartioner will let you iterate through rows to get

more users from the same city in much the same way as a SortedMap interface does

(for more detail, visit http://docs.oracle.com/javase/6/docs/api/java/util/SortedMap.html ) .

However, in RandomPartioner , the key being the MD5 hash value of

<city>_<userId> , the two consecutive userIds from the same city may be such

that there are records for a different city in between. So, we cannot just iterate and expect

grouping to work, like accessing entries of HashMap. (Ideally, you would not want to use

a row key <city>_<userId> for grouping. You would create a compound key with

<city> and <userId> . The purpose of the preceding example was just to show that

consecutive row keys may have records between them.)

We will see partitioners in further detail in Chapter 4 , Deploying a Cluster . But using the

obviously better looking partitioner ByteOrderPartitioner is assumed to be a bad

practice. There are a couple of reasons for this; the major reason being an uneven row key

distribution across nodes. This can potentially cause a hotspot in the ring.

Search WWH ::

Custom Search

Home