Database Reference
In-Depth Information
Read R
d Repaiir
This is another mechanism to ensure consistency throughout the node ring. In a read opera-
tion, if Cassandra detects that some nodes have responded with data that is inconsistent with
the response of other, newer nodes, it makes a note to perform a read repair on the old nodes.
The read repair means that Cassandra will send a write request to the nodes with stale data
to get them up to date with the newer data returned from the original read operation. It does
this by pulling all the data from the node, performing a merge, and writing the merged data
back to the nodes that were out of sync. The detection of inconsistent data is made by com-
paring timestamps and checksums.
The method for reconciliation is the org.apache.cassandra.streaming package.
Replliicattiion
In general distributed systems terms, replication refers to storing multiple copies of data on
multiple machines so that if one machine fails or becomes unavailable due to a Partition , the
cluster can still make data available. Caching is a simple form of replication. In Cassandra,
replication is a means of providing high performance and availability/fault-tolerance.
Replliicattiion F
n Facttor
Cassandra offers a configurable replication factor, which allows you essentially to decide how
much you want to pay in performance to gain more consistency. That is, your consistency
level for reading and writing data is based on the replication factor, as it refers to the number
of nodes across which you have replicated data. The replication factor is set in the configur-
ation file or the API.
See also Consistency Level .
Replliicattiion S
n Sttrattegy
The replication strategy, sometimes referred to as the placement strategy, determines how
replicas will be distributed. The first replica is always placed in the node claiming the key
range of its Token . All remaining replicas are distributed according to a configurable replica-
tion strategy.
The Gang of Four Strategy pattern is employed to allow a pluggable means of replication,
but Cassandra comes with three out of the box. Choosing the right replication strategy is
important because in determining which nodes are responsible for which key ranges, you're
also determining which nodes should receive write operations; this has a big impact on effi-
ciency in different scenarios. The variety of pluggable strategies allows you greater flexibility,
so that you can tune Cassandra according to your network topology and needs.
Search WWH ::




Custom Search