Understanding the Storage Architecture - Professional NoSQL - page 94

Databases Reference

In-Depth Information

the public via a technical paper, which is available online at www.allthingsdistributed.com/

files/amazon-dynamo-sosp2007.pdf . In due course, the ideas of Dynamo where incorporated

into open-source implementations like Apache Cassandra, Voldemort, Riak, and Dynomite. In this

section the fundamental tenets of an eventually consistent key/value store are discussed. Specifi cs of

any of the open-source implementations are left for later chapters.

Amazon Dynamo powers a lot of the internal Amazon services that drive its massive e-commerce

system. This system has a few essential requirements like high availability and fault tolerance.

However, data sets are structured such that query by primary keys is enough for most cases.

Relational references and joins are not required. Dynamo is built on the ideas of consistent hashing,

object versioning, gossip-based membership protocol, merkle trees, and hinted handoff.

Dynamo supports simple get-and-put-based interface to the data store. Put requests include data

related to object version, which are stored in the context. Dynamo is built to incrementally scale as

the data grows. Thus, it relies on consistent hashing for effective partitioning.

Consistent Hashing

Consistent hashing forms an important principle for distributed hash tables. In consistent hashing,

addition or removal of a slot does not signifi cantly change the mapping of keys to the slots. To

appreciate this hashing scheme, let's fi rst look at an elementary hashing scheme and understand the

problems that show up as slots are added or removed.

A very rudimentary key allocation strategy among a set of nodes

could involve the use of modulo function. So, 50 keys can be

distributed among 7 nodes like so: key with value 85 goes to

node 1 because 85 modulo 7 is 1 and key with value 18 goes

to node 4 because 18 modulo 7 is 4, and so on for others. This

strategy works well until the number of nodes changes, that

is, newer ones get added or existing ones get removed. When

the number of nodes changes, the modulo function applied

to the existing keys produces a different output and leads to

rearrangement of the keys among the nodes. This isn't that

effective and that's when consistent hashing comes to the rescue.

A

1

2

3

8

4

D

B

7

In consistent hashing, the rearrangement of keys is not

majorly affected when nodes are added or removed. A good

way to explain consistent hashing is to draw out a circle and

mark the nodes on it as shown in Figure 4-15.

5

C

6

FIGURE 4-15

Now the keys themselves are assigned to the nodes that they are closest to. Which means in Figure

4-15, 1, 2, 3 get assigned to node A, 4 gets assigned to B, 5 and 6 get assigned to C, and 7 and 8 to

D. In order to set up such a scheme, you may create a large hash space, say all the SHA1 keys up to

a very large number, and map that onto a circle. Starting from 0 and going clockwise, you would

map all values to a maximum, at which point you would restart at 0. The nodes would also be

hashed and mapped on the same scheme.

Next Page

Professional NoSQL

Search WWH ::

Custom Search

Home