Database Reference
In-Depth Information
might not even live on the same machine. Here is an example of how two seemingly con-
secutive row keys have MD5 hash that is far apart:
ROW KEY | MD5 HASH VALUE
--------+----------------------------------
1234 | 81dc9bdb52d04dc20036dbd8313ed055
1235 | 9996535e07258a7bbfd8b132435c5962
Let's take an example of two partitioners: ByteOrderPartitioner that preserves
lexical ordering by bytes, and RandomPartitioner that uses an MD5 hash to gener-
ate a row key. Let's assume that we have a users_visits table with a row key,
<city>_<userId> . ByteOrderPartioner will let you iterate through rows to get
more users from the same city in much the same way as a SortedMap interface does
(for more detail, visit http://docs.oracle.com/javase/6/docs/api/java/util/SortedMap.html ) .
However, in RandomPartioner , the key being the MD5 hash value of
<city>_<userId> , the two consecutive userIds from the same city may be such
that there are records for a different city in between. So, we cannot just iterate and expect
grouping to work, like accessing entries of HashMap. (Ideally, you would not want to use
a row key <city>_<userId> for grouping. You would create a compound key with
<city> and <userId> . The purpose of the preceding example was just to show that
consecutive row keys may have records between them.)
We will see partitioners in further detail in Chapter 4 , Deploying a Cluster . But using the
obviously better looking partitioner ByteOrderPartitioner is assumed to be a bad
practice. There are a couple of reasons for this; the major reason being an uneven row key
distribution across nodes. This can potentially cause a hotspot in the ring.
Search WWH ::




Custom Search