NOSQL Overview - Graph Databases

Databases Reference

In-Depth Information

levels of persistence when setting up the database. There is no locking support, however,

for operating across sets of documents atomically. Such abstractions are left to appli‐

cation code to implement in a domain-specific manner.

Because stored documents are not connected (save through indexes), there are numer‐

ous optimistic concurrency control mechanisms that can be used to help reconcile

concurrent contending writes for a single document without having to resort to strict

locks. In fact, some document stores (like CouchDB) have made this a key point of their

value proposition: documents can be held in a multimaster database that automatically

replicates concurrently accessed, contended state across instances without undue in‐

terference from the user.

In other stores, too, the database management system may be able to distinguish and

reconcile writes to different parts of a document, or even use logical timestamps to

reconcile several contended writes into a single logically consistent outcome. This is a

reasonable optimistic trade-off: it reduces the need for transactions (which we know

tend to be latent and decrease availability) by using alternative mechanisms that opti‐

mistically provide greater availability, lower latency, and higher throughput.

Though optimistic concurrency control mechanisms are useful, we also

rather like transactions, and there are numerous examples of high-

throughput performance transaction processing systems in the

literature.

Key-Value Stores

Key-value stores are cousins of the document store family, but their lineage comes

from Amazon's Dynamo database . They act like large, distributed hashmap data struc‐

tures that store and retrieve opaque values by key.

As shown in Figure A-3 the key space of the hashmap is spread across numerous buckets

on the network. For fault-tolerance reasons, each bucket is replicated onto several ma‐

chines. The formula for number of replicas required is given by R = 2F +1 , where F is

the number of failures we can tolerate. The replication algorithm seeks to ensure that

machines aren't exact copies of each other. This allows the system to load-balance while

a machine and its buckets recover; it also helps avoid hotspots, which can cause inad‐

vertent self denial-of-service.

From the client's point of view, key-value stores are easy to use. A client stores a data

element by hashing a domain-specific identifier (key). The hash function is crafted such

that it provides a uniform distribution across the available buckets, which ensures that

no single machine becomes a hotspot. Given the hashed key, the client can use that

Search WWH ::

Custom Search

Home