Cassandra Architecture - Mastering Apache Cassandra

Database Reference

In-Depth Information

In the preceding figure, the curve shows the heartbeat arrival distribution estimate based

on past samples. It is used to calculate the value of ϕ based on last arrival, T last , and t now .

One may question as to where a heartbeat is being sent in Cassandra. Gossip has it!

Gossip and failure detection

During gossip sessions, each node maintains a list of the arrival time stamps of gossip

messages from the other nodes. This list is basically a sliding window, which, in turn, is

used to calculate P later . One may set the sensitivity of the ϕ thres threshold.

ϕ thres can be understood like this. Let's say we start to suspect whether a node is dead

when ϕ >= ϕ thres . When ϕ thres is 1, it is equivalent to - log(0.1). The probability that we

will make a mistake (that is, the decision that the node is dead will be contradicted in fu-

ture by a late arriving heartbeat) is 0.1 or 10 percent. Similarly, with ϕ thres = 2, the prob-

ability of making a mistake goes down to 1 percent; with ϕ thres = 3, it drops to 0.1 per-

cent; and so on, following log base 10 formula.

Partitioner

Cassandra is a distributed database management system. This means it takes a single lo-

gical database and distributes it over one or more machines in the database cluster. So,

when you insert some data in Cassandra with a unique row key, based on that row key,

Cassandra assigns that row to one of the nodes that's responsible for managing it.

Let's try to understand this. Cassandra inherits a data model from Google's BigTable ( ht-

tp://research.google.com/archive/bigtable.html ) . This means we can roughly assume that

the data is stored in some sort of a table that has an unlimited number of columns (not

really unlimited; Cassandra limits the maximum number of cells to be 2 billion per parti-

tion) with rows having a unique key, namely row key. Now, your terabytes of data on one

machine will be restrictive from multiple points of view. One is disk space, and another is

limited parallel processing, and if not duplicated, a source of single point of failure. What

Cassandra does is, it defines some rules to slice data across rows and assigns which node

in the cluster is responsible for holding which slice. This task is done by a partitioner.

There are several types of partitioners to choose from. We'll discuss them in detail in

Chapter 4 , Deploying a Cluster . In short, Cassandra (as of Version 1.2) offers three parti-

tioners, as follows:

Search WWH ::

Custom Search

Home