Consistency Management in Cloud Storage Systems - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

inefficiency of RDBMs for this type of applications. Reliability and scaling require-

ments within this platform services are high. Moreover, availability is very impor-

tant, as the increase of latencies by only minimal fractions can cause financial losses.

Dynamo provides a flexible design where services may control availability, con-

sistency, cost-effectiveness and performance tradeoffs. Dynamo's data model rely

on a simple key/value scheme. Since the targeted applications and services within

Amazon do not require complex querying models, a record- or key-based queries

are considered both enough in term of requirements and efficient in terms of perfor-

mance scaling.

Dynamo's design relies on a consistent hashing-based partitioning scheme [33]. In

the implemented scheme, the resulting range or space of a hash function is considered

as a ring. Every member of the ring is a virtual node (host) where a physical node

may be responsible for one or more virtual nodes. The introduction of virtual nodes,

instead of using fixed physical nodes on the ring, is a choice that provides better avail-

ability and load balancing under failures. Each data item can be assigned to a node on

the ring based on its key. The hashed value of the key determines its position on the

ring. Data then, is assigned to the closest node on the ring clockwise. Moreover, data

is replicated on the successive ( K − 1) nodes for a given replication factor K , avoiding

virtual nodes that belong to the same physical nodes. All the nodes on Dynamo are

considered equals and are able to compute the reference list for any given key. The

reference list is the list of nodes that store a copy of data referenced by the key.

Dynamo is an eventually consistent system. Updates are asynchronously propa-

gated to replicas. As data is usually available while updates are being propagated,

clients may perform updates on older versions of data for which the last updates have

not yet been committed. As a result, the system may suffer from updates conflicts.

To deal with these situations, Dynamo relies on data versioning. Every updated rep-

lica is assigned a new immutable version. The conflicting versions of data resulting

from concurrent updates may be solved at a latter time. This allows the system to be

always available and fast to respond to client requests. Versions that share a causal

relation are easy to solve by the system based on syntactic reconciliation. However,

a difficulty arises with versions branching. This often happens in the presence of

failures combined with concurrent updates and results in conflicting versions of data.

The reconciliation in this case is left to the client rather than the system because

the latter lacks the semantic context. The reconciliation is performed by collaps-

ing the multiple data versions into one (semantic reconciliation). A simple example

is the case of the shopping cart application. This application chooses to merge the

diverging versions as a reconciliation strategy. To detect inconsistencies between

replicas and repair them in the event of failures and other threats to data durability,

Dynamo implements an anti-entropy replicas synchronization protocol.

Clients can interact with dynamo through a flexible API that provides various

consistency configurations. Replica consistency is handled by a quorum-like system.

In a system that maintains N replicas, R is the minimum number of nodes (replicas)

that must participate in the read operation, and W is the minimum number of nodes

that must participate in the write operation, which are configured on a per operation

basis and are of high importance. By setting these two parameters, one can define the

tradeoff between consistency and latency. A configuration that provides R + W > N

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home