Database Reference
In-Depth Information
might not even live on the same machine. Here is an example of how two seemingly con-
secutive row keys have MD5 hash that is far apart:
ROW KEY | MD5 HASH VALUE
--------+----------------------------------
1234 | 81dc9bdb52d04dc20036dbd8313ed055
1235 | 9996535e07258a7bbfd8b132435c5962
Let's take an example of two partitioners:
ByteOrderPartitioner
that preserves
lexical ordering by bytes, and
RandomPartitioner
that uses an MD5 hash to gener-
ate a row key. Let's assume that we have a
users_visits
table with a row key,
<city>_<userId>
.
ByteOrderPartioner
will let you iterate through rows to get
more users from the same city in much the same way as a
SortedMap
interface does
(for more detail, visit
http://docs.oracle.com/javase/6/docs/api/java/util/SortedMap.html
)
.
However, in
RandomPartioner
, the key being the MD5 hash value of
<city>_<userId>
, the two consecutive
userIds
from the same city may be such
that there are records for a different city in between. So, we cannot just iterate and expect
grouping to work, like accessing entries of HashMap. (Ideally, you would not want to use
a row key
<city>_<userId>
for grouping. You would create a compound key with
<city>
and
<userId>
. The purpose of the preceding example was just to show that
consecutive row keys may have records between them.)
obviously better looking partitioner
ByteOrderPartitioner
is assumed to be a bad
practice. There are a couple of reasons for this; the major reason being an uneven row key
distribution across nodes. This can potentially cause a hotspot in the ring.