Database Reference
In-Depth Information
It's worth noting that OPP isn't more efficient for range queries than random partitioning—it
just provides ordering. It has the disadvantage of creating a ring that is potentially very lopsided,
because real-world data typically is not written to evenly. As an example, consider the value as-
signed to letters in a Scrabble game. Q and Z are rarely used, so they get a high value. With
OPP, you'll likely eventually end up with lots of data on some nodes and much less data on other
nodes. The nodes on which lots of data is stored, making the ring lopsided, are often referred
to as “hot spots.” Because of the ordering aspect, users are commonly attracted to OPP early
on. However, using OPP means that your operations team will need to manually rebalance nodes
periodically using Nodetool's loadbalance or move operations.
If you want to perform range queries from your clients, you must use an order-preserving parti-
tioner or a collating order-preserving partitioner.
Collating Order-Preserving Partitioner
This partitioner orders keys according to a United States English locale ( EN_US ). Like OPP, it
requires that the keys are UTF-8 strings. Although its name might imply that it extends the OPP,
it doesn't. Instead, this class extends AbstractByteOrderedPartitioner . This partitioner is
rarely employed, as its usefulness is limited.
Byte-Ordered Partitioner
New for 0.7, the team added ByteOrderedPartitioner , which is an order-preserving parti-
tioner that treats the data as raw bytes, instead of converting them to strings the way the order-
preserving partitioner and collating order-preserving partitioner do. If you need an order-pre-
serving partitioner that doesn't validate your keys as being strings, BOP is recommended for the
performance improvement.
Snitches
The job of a snitch is simply to determine relative host proximity. Snitches gather some inform-
ation about your network topology so that Cassandra can efficiently route requests. The snitch
will figure out where nodes are in relation to other nodes. Inferring data centers is the job of the
replication strategy.
Simple Snitch
By default, Cassandra uses org.apache.cassandra.locator.EndPointSnitch . It operates
by simply comparing different octets in the IP addresses of each node. If two hosts have the same
value in the second octet of their IP addresses, then they are determined to be in the same data
center. If two hosts have the same value in the third octet of their IP addresses, then they are
Search WWH ::




Custom Search