Database Reference
In-Depth Information
java.util.concurrent library components such as ExecutorService and Future<T>
for writing data to buffers.
Avro
Avro is (probably) replacing Thrift as the RPC client for interacting with Cassandra. Avro is
a subproject of the Apache Hadoop project, created by Doug Cutting (creator of Hadoop and
Lucene). It provides functionality similar to Thrift, but is a dynamic data serialization library
that has an advantage over Thrift in that it does not require static code generation. Another
reason that the project is migrating to Avro is that Thrift was originally created by Facebook
and then donated to Apache, but since that time has received little active development atten-
tion.
This means that the Cassandra server will be ported from
org.apache.cassandra.thrift.CassandraServer to
org.apache.cassandra.avro.CassandraServer . As of this writing, this is underway but
not yet complete.
You can find out more about Avro at its project page, http://avro.apache.org .
Biigttablle
Bigtable is a distributed database created at Google in 2006 as a high-performance columnar
database on top of Google File System (GFS). Bigtable and Amazon's Dynamo database are
the direct parents of Cassandra. Cassandra inherits these aspects from Bigtable: sparse array
data and disk storage using an SSTable .
Yahoo!'s HBase is a Bigtable clone.
You can read the complete Google Bigtable paper at http://labs.google.com/papers/bigt-
able.html .
Blloom F
m Fiilltter
In simple terms, a Bloom filter is a very fast, nondeterministic algorithm for testing whether
an element is a member of a set. These algorithms are nondeterministic because it is possible
to get a false-positive read but not a false-negative. Bloom filters work by mapping the values
in a dataset into a bit array and condensing a larger dataset into a digest string. The digest,
by definition, uses a much smaller amount of memory than the original data would.
Cassandra uses Bloom filters to reduce disk access, which can be expensive, on key lookups.
Every SSTable has an associated Bloom filter; when a query is performed, the Bloom filter
is checked first before accessing disk. Because false-negatives are not possible, if the filter
indicates that the element does not exist in the set, it certainly doesn't; if the filter thinks that
the element is in the set, the disk is accessed to make sure.
Search WWH ::




Custom Search