Databases Reference
In-Depth Information
Many concepts in Hypertable and HBase are the same and so repeating those same concepts here
is not benefi cial. However, a passing reference to an idea that is important in both places but not
discussed yet is that of a Bloom Filter.
Bloom Filter
Bloom Filter is a probabilistic data structure used to test whether an element is a member of a set.
Think of a Bloom Filter as an array of m number of bits. An empty Bloom Filter has a value of 0 in
all its m positions. Now if elements a , b , and c are members of a set, then they are mapped via a set
of k hash functions to the Bloom Filter. This means each of the members, that is a , b , and c are
mapped via the k different hash functions to k positions on the Bloom Filter. Whenever a member
is mapped via a hash function to a specifi c position in the m bit array the value in that particular
position is set to 1. Different members, when passed through the hash functions, could map to the
same position in the Bloom Filter.
Now to test if a given element, say w , is a member of a set, you need to pass it through the k
hash functions and map the outcome on the Bloom Filter array. If the value at any of the mapped
positions is 0, the element is not a member of the set. If the value at all the positions is 1, then either
the element is a member of the set or the element maps to one or more position where the value was
set to 1 by another element. Therefore, false positives are possible but false negatives are not.
Learn more about Bloom Filter, explained in the context of PERL, at www.perl.com/pub/2004/
04/08/bloom_filters.html .
APACHE CASSANDRA
Apache Cassandra is simultaneously a very popular and infamous NoSQL database. A few
examples that used Cassandra in the early part of the topic introduced the core ideas of the store.
In this section, I review Cassandra's core architecture to understand how it works.
Peer-to-Peer Model
Most databases, including the most popular of NoSQL stores, follow a master-slave model for
scaling out and replication. This means for each set, writes are committed to the master node and
replicated down to the slaves. The slaves provide enhanced read scalability but not write scalability.
Cassandra moves away from the master-slave model and instead uses a peer-to-peer model. This
means there is no single master but all the nodes are potentially masters. This makes the writes and
reads extremely scalable and even allows nodes to function in cases of partition tolerance. However,
extreme scalability comes at a cost, which in this case is a compromise in strong consistency. The
peer-to-peer model follows a weak consistency model.
Based on Gossip and Anti-entropy
Cassandra's peer-to-peer scalability and eventual consistency model makes it important to
establish a protocol to communicate among the peers and detect node failure. Cassandra relies on
a gossip-based protocol to communicate among the nodes. Gossip, as the name suggests, uses an
Search WWH ::




Custom Search