Database Reference
In-Depth Information
In the preceding figure, the curve shows the heartbeat arrival distribution estimate based
on past samples. It is used to calculate the value of ϕ based on last arrival, T last , and t now .
One may question as to where a heartbeat is being sent in Cassandra. Gossip has it!
Gossip and failure detection
During gossip sessions, each node maintains a list of the arrival time stamps of gossip
messages from the other nodes. This list is basically a sliding window, which, in turn, is
used to calculate P later . One may set the sensitivity of the ϕ thres threshold.
ϕ thres can be understood like this. Let's say we start to suspect whether a node is dead
when ϕ >= ϕ thres . When ϕ thres is 1, it is equivalent to - log(0.1). The probability that we
will make a mistake (that is, the decision that the node is dead will be contradicted in fu-
ture by a late arriving heartbeat) is 0.1 or 10 percent. Similarly, with ϕ thres = 2, the prob-
ability of making a mistake goes down to 1 percent; with ϕ thres = 3, it drops to 0.1 per-
cent; and so on, following log base 10 formula.
Partitioner
Cassandra is a distributed database management system. This means it takes a single lo-
gical database and distributes it over one or more machines in the database cluster. So,
when you insert some data in Cassandra with a unique row key, based on that row key,
Cassandra assigns that row to one of the nodes that's responsible for managing it.
Let's try to understand this. Cassandra inherits a data model from Google's BigTable ( ht-
tp://research.google.com/archive/bigtable.html ) . This means we can roughly assume that
the data is stored in some sort of a table that has an unlimited number of columns (not
really unlimited; Cassandra limits the maximum number of cells to be 2 billion per parti-
tion) with rows having a unique key, namely row key. Now, your terabytes of data on one
machine will be restrictive from multiple points of view. One is disk space, and another is
limited parallel processing, and if not duplicated, a source of single point of failure. What
Cassandra does is, it defines some rules to slice data across rows and assigns which node
in the cluster is responsible for holding which slice. This task is done by a partitioner.
There are several types of partitioners to choose from. We'll discuss them in detail in
Chapter 4 , Deploying a Cluster . In short, Cassandra (as of Version 1.2) offers three parti-
tioners, as follows:
Search WWH ::




Custom Search