Databases Reference
In-Depth Information
levels of persistence when setting up the database. There is no locking support, however,
for operating across sets of documents atomically. Such abstractions are left to appli‐
cation code to implement in a domain-specific manner.
Because stored documents are not connected (save through indexes), there are numer‐
ous optimistic concurrency control mechanisms that can be used to help reconcile
concurrent contending writes for a single document without having to resort to strict
locks. In fact, some document stores (like CouchDB) have made this a key point of their
value proposition: documents can be held in a multimaster database that automatically
replicates concurrently accessed, contended state across instances without undue in‐
terference from the user.
In other stores, too, the database management system may be able to distinguish and
reconcile writes to different parts of a document, or even use logical timestamps to
reconcile several contended writes into a single logically consistent outcome. This is a
reasonable optimistic trade-off: it reduces the need for transactions (which we know
tend to be latent and decrease availability) by using alternative mechanisms that opti‐
mistically provide greater availability, lower latency, and higher throughput.
Though optimistic concurrency control mechanisms are useful, we also
rather like transactions, and there are numerous examples of high-
throughput performance transaction processing systems in the
literature.
Key-Value Stores
Key-value stores are cousins of the document store family, but their lineage comes
from Amazon's Dynamo database . They act like large, distributed hashmap data struc‐
tures that store and retrieve opaque values by key.
As shown in Figure A-3 the key space of the hashmap is spread across numerous buckets
on the network. For fault-tolerance reasons, each bucket is replicated onto several ma‐
chines. The formula for number of replicas required is given by R = 2F +1 , where F is
the number of failures we can tolerate. The replication algorithm seeks to ensure that
machines aren't exact copies of each other. This allows the system to load-balance while
a machine and its buckets recover; it also helps avoid hotspots, which can cause inad‐
vertent self denial-of-service.
From the client's point of view, key-value stores are easy to use. A client stores a data
element by hashing a domain-specific identifier (key). The hash function is crafted such
that it provides a uniform distribution across the available buckets, which ensures that
no single machine becomes a hotspot. Given the hashed key, the client can use that
Search WWH ::




Custom Search