Database Reference
In-Depth Information
Although it is a disadvantage that false-positives are possible with Bloom filters, their ad-
vantage is that they can be very fast because they use space efficiently, due to the fact that
(unlike simple arrays, hashtables, or linked lists) they do not store their elements completely.
Instead, Bloom filters make heavy use of memory and reduce disk access. One result is that
the number of false-positives increases as the number of elements increases.
Bloom filters are used by Apache Hadoop, Google Bigtable, and Squid Proxy Cache. They
are named for their inventor, Burton Bloom.
Cassandra
In Greek mythology, Cassandra was the daughter of King Priam and Queen Hecuba of Troy.
She was so beautiful that the god Apollo gave her the ability to see the future. But when
she refused his amorous advances, he cursed her such that she would accurately predict
everything that would happen, yet no one would believe her. Cassandra foresaw the destruc-
tion of her city of Troy, but was powerless to stop it. The Cassandra distributed database is
named for her.
The data store itself is an Apache project available at http://cassandra.apache.org . It started
in incubator status in January of 2009. It has the following key properties: it is decentralized,
elastic, fault-tolerant, tuneably consistent, highly available, and designed to massively scale
on commodity servers spread across different data centers. It is in use at companies such
as Digg, Facebook, Twitter, Cloudkick, Cisco, IBM, Reddit, Rackspace, SimpleGeo, Ooyala,
and OpenX.
Cassandra was originally written at Facebook to solve their Inbox Search problem. The team
was led by Jeff Hammerbacher, with Avinash Lakshman, Karthik Ranganathan, and Face-
book engineer on the Search Team Prashant Malik as key engineers. The code was released
as an open source Google Code project in July of 2008. In March of 2009, it was moved to
an Apache Incubator project, and on February 17 of that year, it was voted into a top-level
project.
A central paper on Cassandra by Facebook's Lakshman and Malik called “A Decentralized
Structured Storage System” is available at http://www.cs.cornell.edu/projects/ladis2009/pa-
pers/lakshman-ladis2009.pdf .
A blog post from 2008 by Avinash Lakshman describes how they were using Cassandra
at
Facebook:
http://www.facebook.com/
note.php?note_id=24413138919&id=9445547199&index=9 .
It is easy to see why the Cassandra database is aptly named: its community asserts that Cas-
sandra and other related NoSQL databases are the future. Despite widespread use of even-
tually consistent databases at companies such as Amazon, Google, Facebook, and Twitter,
Search WWH ::




Custom Search