Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

Consistency management

The architecture model for Cassandra is Availability and Partitioning (AP) with eventual consist-

ency. Cassandra's consistency is measured by how recent and concurrent all replicas are for one row

of data. Though the database is built on an eventual consistency model, real-world applications will

mandate consistency for all read and write operations. To manage the user interaction and keep the

consistency, Cassandra provides a model called tunable consistency. In this model, the client applica-

tion decides the level of consistency desired by that application. This allows the user the flexibility to

manage different classes of applications at different levels of consistency. There are additional built-in

repair mechanisms for consistency management and tuning. A key point to remember is consistency

depends on replication factor implementation in Cassandra.

Write consistency

Since consistency is a configuration setting in Cassandra, a write operation can specify its desired

level of consistency. Cassandra lets you choose between weak and strong consistency levels. The con-

sistency levels shown in Table 4.2 are available.

Read consistency

The read consistency level specifies how many replicas must respond before a result is returned to the

client application. When a read request is made, Cassandra checks the specified number of replicas

for the most recent data based on the timestamp data, to satisfy the read request.

Note: Local and each quorum are defined in large multi-data center configurations.

Specifying client consistency levels

The consistency level is specified by the client application when a read or write request is made. For

example,

SELECT * FROM CUSTOMERS WHERE STATE='IL' USING CONSISTENCY QUORUM;

Built-in consistency repair features

Cassandra has a number of built-in repair features to ensure that data remains consistent across replicas:

●

Read repair. A read repair is a technique that ensures that all nodes in a cluster are synchronized

with the latest version of data. When Cassandra detects that several nodes in the cluster are out

of sync, it marks the nodes with a read repair flag. This triggers a process of synchronizing the

stale nodes with the newest version of the data requested. The check for inconsistent data is

implemented by comparing the clock value of the data and the clock value of the newest data. Any

node with a clock value that is older than the newest data is effectively flagged as stale.

●

Anti-entropy node repair. This is a process that is run as a part of maintenance and called a NodeTool

process. This is a sync operation across the entire cluster where the nodes are updated to be

consistent. It is not an automatic process and needs manual intervention. During this process, the

nodes exchange information represented as Merkle trees, and if the tree information is not consistent,

a reconciliation exercise needs to be carried out. This feature comes from Amazon Dynamo, with the

difference being that in Cassandra each column family maintains its own Merkle tree.

●

Note: A Merkle tree is a hash key hierarchy verification and authentication technique. When

replicas are down for extended periods, the Merkle tree keeps checking small portions of the

replicas until the sync is broken, enabling a quick recovery. (For more information on Merkle

trees, check Ralph Merkle's website at www.merkle.com . )

Data Warehousing in the Age of Big Data

Search WWH ::

Custom Search

Home