Database Reference
In-Depth Information
Replication in Cassandra and strategies
Replicating means to create a copy. This copy makes the data redundant and thus available
even when one node fails or goes down. In Cassandra, you have the option to specify the
replication factor as part of the creation of the keyspace or to later modify it. Attributes that
need to be specified in this context are as follows:
Replication factor : This is a numeric value specifying the number of replicas
Strategy : This could be simple strategy or topology strategy; this decides the
placement of replicas across the cluster
Internally, Cassandra uses the row key to store replicas or copies of data across various
nodes on the cluster. A replication factor of n means there are n copies of data stored on n
different nodes. There are certain rules of thumb with replication, and they are as follows:
• A replication factor should never be more than the number of nodes in a cluster, or
you will run into exceptions due to not enough replicas and Cassandra will start re-
jecting the writes and reads, though replication factor would continue uninterrup-
ted
• A replication factor should not be so small that data is lost forever if one odd node
goes down
Snitch is used to determine the physical location of nodes, attributes such as closeness to
each other, and so on, which are of value when a vast amount of data is to be replicated and
moved to and fro. In all such situations, network latency plays a very important part. The
two strategies currently supported by Cassandra are as follows:
Simple : This is the default strategy provided by Cassandra for all keyspaces. It em-
ploys around a single data center. It's pretty straightforward and simple in its oper-
ation; as the name suggests, the partitioner checks the key-value against the node
range to determine the placement of the first replica. Thereon, the subsequent rep-
licas are placed on the next nodes in a clockwise order. So if data item "A" has a
replication of "3", and the partitioner decides the first node based on the key and
ownership, on this node the subsequent replicas are created in a clockwise order.
Network : This is the topology that is used when we have the Cassandra cluster
distributed across data centers. Here, we can plan our replica placement and define
how many replicas we want to place in each data center. This approach makes the
data geo-redundant and thus more fail-safe in cases where the entire data center
crashes. The following are two things you should consider when making a choice
on replica placement across data centers:
Search WWH ::




Custom Search