Configuring Cassandra - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

It's worth noting that OPP isn't more efficient for range queries than random partitioning—it

just provides ordering. It has the disadvantage of creating a ring that is potentially very lopsided,

because real-world data typically is not written to evenly. As an example, consider the value as-

signed to letters in a Scrabble game. Q and Z are rarely used, so they get a high value. With

OPP, you'll likely eventually end up with lots of data on some nodes and much less data on other

nodes. The nodes on which lots of data is stored, making the ring lopsided, are often referred

to as “hot spots.” Because of the ordering aspect, users are commonly attracted to OPP early

on. However, using OPP means that your operations team will need to manually rebalance nodes

periodically using Nodetool's loadbalance or move operations.

If you want to perform range queries from your clients, you must use an order-preserving parti-

tioner or a collating order-preserving partitioner.

Collating Order-Preserving Partitioner

This partitioner orders keys according to a United States English locale ( EN_US ). Like OPP, it

requires that the keys are UTF-8 strings. Although its name might imply that it extends the OPP,

it doesn't. Instead, this class extends AbstractByteOrderedPartitioner . This partitioner is

rarely employed, as its usefulness is limited.

Byte-Ordered Partitioner

New for 0.7, the team added ByteOrderedPartitioner , which is an order-preserving parti-

tioner that treats the data as raw bytes, instead of converting them to strings the way the order-

preserving partitioner and collating order-preserving partitioner do. If you need an order-pre-

serving partitioner that doesn't validate your keys as being strings, BOP is recommended for the

performance improvement.

Snitches

The job of a snitch is simply to determine relative host proximity. Snitches gather some inform-

ation about your network topology so that Cassandra can efficiently route requests. The snitch

will figure out where nodes are in relation to other nodes. Inferring data centers is the job of the

replication strategy.

Simple Snitch

By default, Cassandra uses org.apache.cassandra.locator.EndPointSnitch . It operates

by simply comparing different octets in the IP addresses of each node. If two hosts have the same

value in the second octet of their IP addresses, then they are determined to be in the same data

center. If two hosts have the same value in the third octet of their IP addresses, then they are

Search WWH ::

Custom Search

Home