Glossary - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

java.util.concurrent library components such as ExecutorService and Future<T>

for writing data to buffers.

Avro

Avro is (probably) replacing Thrift as the RPC client for interacting with Cassandra. Avro is

a subproject of the Apache Hadoop project, created by Doug Cutting (creator of Hadoop and

Lucene). It provides functionality similar to Thrift, but is a dynamic data serialization library

that has an advantage over Thrift in that it does not require static code generation. Another

reason that the project is migrating to Avro is that Thrift was originally created by Facebook

and then donated to Apache, but since that time has received little active development atten-

tion.

This means that the Cassandra server will be ported from

org.apache.cassandra.thrift.CassandraServer to

org.apache.cassandra.avro.CassandraServer . As of this writing, this is underway but

not yet complete.

You can find out more about Avro at its project page, http://avro.apache.org .

Biigttablle

Bigtable is a distributed database created at Google in 2006 as a high-performance columnar

database on top of Google File System (GFS). Bigtable and Amazon's Dynamo database are

the direct parents of Cassandra. Cassandra inherits these aspects from Bigtable: sparse array

data and disk storage using an SSTable .

Yahoo!'s HBase is a Bigtable clone.

You can read the complete Google Bigtable paper at http://labs.google.com/papers/bigt-

able.html .

Blloom F

m Fiilltter

In simple terms, a Bloom filter is a very fast, nondeterministic algorithm for testing whether

an element is a member of a set. These algorithms are nondeterministic because it is possible

to get a false-positive read but not a false-negative. Bloom filters work by mapping the values

in a dataset into a bit array and condensing a larger dataset into a digest string. The digest,

by definition, uses a much smaller amount of memory than the original data would.

Cassandra uses Bloom filters to reduce disk access, which can be expensive, on key lookups.

Every SSTable has an associated Bloom filter; when a query is performed, the Bloom filter

is checked first before accessing disk. Because false-negatives are not possible, if the filter

indicates that the element does not exist in the set, it certainly doesn't; if the filter thinks that

the element is in the set, the disk is accessed to make sure.

Search WWH ::

Custom Search

Home