Database Reference
In-Depth Information
D. Since all nodes know about what partitioner and what snitch is being set, they know
which nodes have which row keys.
Now that we have observed that partitioning has such a drastic effect on the data move-
ment and distribution, one may think that a bad partitioner can lead to uneven data distri-
bution. In fact, our example ring in the previous paragraph might be a bad partitioner. For
a dataset where terms with a specific starting letter have a very high population than the
terms with other letters, the ring will be lopsided. A good partitioner is one that is quick to
calculate the position from the row key and distributes the row keys evenly; something
like a partitioner based on a consistent hashing algorithm.
Replication
Cassandra runs on commodity hardware, and works reliably in network partitions.
However, this comes with a cost: replication. To avoid data inaccessibility if a node goes
down or becomes unavailable, one must replicate data to more than one node. Replication
provides features such as fault tolerance and no single point of failure to the system. Cas-
sandra provides more than one strategy to replicate the data, and one can configure the
replication factor while creating key space. This will be discussed in detail in Chapter 3 ,
Effective CQL .
Replication is tightly bound to consistency level ( CL ). CL can be thought of as an answer
to the question: How many replicas must respond positively to declare a successful opera-
tion? If you have a read consistency level three, that means a client will be returned a suc-
cessful read as soon as three replicas respond with the data. The same goes for write. For
write consistency three, at least three replicas must respond that the write to them was
successful. Obviously, the replication factor must be greater than any consistency level,
otherwise there will never be enough replicas to write to, or read from, successfully.
Note
Do not confuse replication factor with the number of nodes in the system. The replication
factor is the number of copies of data. The number of nodes just affects how much data a
node will hold based on the configured partitioner.
Replication should be thought of as an added redundancy. One should never have a replic-
ation factor one in their production environment. If you think having multiple writes to
different replicas will slow down the writes, you can set up a favorable consistency level.
Cassandra offers a set of consistency levels, including fire and forget, CL ZERO, and en-
Search WWH ::




Custom Search