Cassandra Partitioning, High Availability, and Consistency - Real-time Analytics with Storm and Cassandra

Database Reference

In-Depth Information

Replication in Cassandra and strategies

Replicating means to create a copy. This copy makes the data redundant and thus available

even when one node fails or goes down. In Cassandra, you have the option to specify the

replication factor as part of the creation of the keyspace or to later modify it. Attributes that

need to be specified in this context are as follows:

• Replication factor : This is a numeric value specifying the number of replicas

• Strategy : This could be simple strategy or topology strategy; this decides the

placement of replicas across the cluster

Internally, Cassandra uses the row key to store replicas or copies of data across various

nodes on the cluster. A replication factor of n means there are n copies of data stored on n

different nodes. There are certain rules of thumb with replication, and they are as follows:

• A replication factor should never be more than the number of nodes in a cluster, or

you will run into exceptions due to not enough replicas and Cassandra will start re-

jecting the writes and reads, though replication factor would continue uninterrup-

ted

• A replication factor should not be so small that data is lost forever if one odd node

goes down

Snitch is used to determine the physical location of nodes, attributes such as closeness to

each other, and so on, which are of value when a vast amount of data is to be replicated and

moved to and fro. In all such situations, network latency plays a very important part. The

two strategies currently supported by Cassandra are as follows:

• Simple : This is the default strategy provided by Cassandra for all keyspaces. It em-

ploys around a single data center. It's pretty straightforward and simple in its oper-

ation; as the name suggests, the partitioner checks the key-value against the node

range to determine the placement of the first replica. Thereon, the subsequent rep-

licas are placed on the next nodes in a clockwise order. So if data item "A" has a

replication of "3", and the partitioner decides the first node based on the key and

ownership, on this node the subsequent replicas are created in a clockwise order.

• Network : This is the topology that is used when we have the Cassandra cluster

distributed across data centers. Here, we can plan our replica placement and define

how many replicas we want to place in each data center. This approach makes the

data geo-redundant and thus more fail-safe in cases where the entire data center

crashes. The following are two things you should consider when making a choice

on replica placement across data centers:

Search WWH ::

Custom Search

Home