Database Reference
In-Depth Information
caching layers atop a more traditional database or as simple persistence layers for
ephemeral services like job queues.
S OPHISTICATED KEY - VALUE STORES
It's possible to refine the simple key-value model to handle complicated read/write
schemes or to provide a richer data model. In these cases, we end up with what we'll
term a sophisticated key-value store . One example is Amazon's Dynamo, described in a
widely studied white paper entitled Dynamo: Amazon's Highly Available Key-value Store .
The aim of Dynamo is to be a database robust enough to continue functioning in the
face of network failures, data center outages, and other similar disruptions. This
requires that the system can always be read from and written to, which essentially
requires that data be automatically replicated across multiple nodes. If a node fails, a
user of the system, perhaps in this case a customer with an Amazon shopping cart,
won't experience any interruptions in service. Dynamo provides ways of resolving the
inevitable conflicts that arise when a system allows the same data to be written to mul-
tiple nodes. At the same time, Dynamo is easily scaled. Because it's masterless—all
nodes are equal—it's easy to understand the system as a whole, and nodes can be
added easily. Although Dynamo is a proprietary system, the ideas used to build it have
inspired many systems falling under the NoSQL umbrella, including Cassandra, Proj-
ect Voldemort, and Riak.
Looking at who developed these sophisticated key-value stores, and how they've
been used in practice, you can see where these systems shine. Let's take Cassandra,
which implements many of Dynamo's scaling properties while providing a column-
oriented data model inspired by Google's BigTable. Cassandra is an open source ver-
sion of a data store built by Facebook for its inbox search feature. The system scaled
horizontally to index more than 50 TB of inbox data, allowing for searches on inbox
keywords and recipients. Data was indexed by user ID , where each record consisted of
an array of search terms for keyword searches and an array of recipient ID s for recipi-
ent searches. 9
These sophisticated key-value stores were developed by major internet companies
such as Amazon, Google, and Facebook to manage cross sections of systems with
extraordinarily large amounts of data. In other words, sophisticated key-value stores
manage a relatively self-contained domain that demands significant storage and avail-
ability. Because of their masterless architecture, these systems scale easily with the
addition of nodes. They opt for eventual consistency, which means that reads don't
necessarily reflect the latest write. But what users get in exchange for weaker consis-
tency is the ability to write in the face of any one node's failure.
This contrasts with MongoDB, which provides strong consistency, a single master
(per shard), a richer data model, and secondary indexes. The last two of these attri-
butes go hand in hand; if a system allows for modeling multiple domains, as, for
example, would be required to build a complete web application, then it's going to be
necessary to query across the entire data model, thus requiring secondary indexes.
9
http://mng.bz/5321
 
Search WWH ::




Custom Search