Database Reference
In-Depth Information
inefficiency of RDBMs for this type of applications. Reliability and scaling require-
ments within this platform services are high. Moreover, availability is very impor-
tant, as the increase of latencies by only minimal fractions can cause financial losses.
Dynamo provides a flexible design where services may control availability, con-
sistency, cost-effectiveness and performance tradeoffs. Dynamo's data model rely
on a simple key/value scheme. Since the targeted applications and services within
Amazon do not require complex querying models, a record- or key-based queries
are considered both enough in term of requirements and efficient in terms of perfor-
mance scaling.
Dynamo's design relies on a consistent hashing-based partitioning scheme [33]. In
the implemented scheme, the resulting range or space of a hash function is considered
as a ring. Every member of the ring is a virtual node (host) where a physical node
may be responsible for one or more virtual nodes. The introduction of virtual nodes,
instead of using fixed physical nodes on the ring, is a choice that provides better avail-
ability and load balancing under failures. Each data item can be assigned to a node on
the ring based on its key. The hashed value of the key determines its position on the
ring. Data then, is assigned to the closest node on the ring clockwise. Moreover, data
is replicated on the successive ( K − 1) nodes for a given replication factor K , avoiding
virtual nodes that belong to the same physical nodes. All the nodes on Dynamo are
considered equals and are able to compute the reference list for any given key. The
reference list is the list of nodes that store a copy of data referenced by the key.
Dynamo is an eventually consistent system. Updates are asynchronously propa-
gated to replicas. As data is usually available while updates are being propagated,
clients may perform updates on older versions of data for which the last updates have
not yet been committed. As a result, the system may suffer from updates conflicts.
To deal with these situations, Dynamo relies on data versioning. Every updated rep-
lica is assigned a new immutable version. The conflicting versions of data resulting
from concurrent updates may be solved at a latter time. This allows the system to be
always available and fast to respond to client requests. Versions that share a causal
relation are easy to solve by the system based on syntactic reconciliation. However,
a difficulty arises with versions branching. This often happens in the presence of
failures combined with concurrent updates and results in conflicting versions of data.
The reconciliation in this case is left to the client rather than the system because
the latter lacks the semantic context. The reconciliation is performed by collaps-
ing the multiple data versions into one (semantic reconciliation). A simple example
is the case of the shopping cart application. This application chooses to merge the
diverging versions as a reconciliation strategy. To detect inconsistencies between
replicas and repair them in the event of failures and other threats to data durability,
Dynamo implements an anti-entropy replicas synchronization protocol.
Clients can interact with dynamo through a flexible API that provides various
consistency configurations. Replica consistency is handled by a quorum-like system.
In a system that maintains N replicas, R is the minimum number of nodes (replicas)
that must participate in the read operation, and W is the minimum number of nodes
that must participate in the write operation, which are configured on a per operation
basis and are of high importance. By setting these two parameters, one can define the
tradeoff between consistency and latency. A configuration that provides R + W N
Search WWH ::




Custom Search