Databases Reference
In-Depth Information
the public via a technical paper, which is available online at www.allthingsdistributed.com/
files/amazon-dynamo-sosp2007.pdf . In due course, the ideas of Dynamo where incorporated
into open-source implementations like Apache Cassandra, Voldemort, Riak, and Dynomite. In this
section the fundamental tenets of an eventually consistent key/value store are discussed. Specifi cs of
any of the open-source implementations are left for later chapters.
Amazon Dynamo powers a lot of the internal Amazon services that drive its massive e-commerce
system. This system has a few essential requirements like high availability and fault tolerance.
However, data sets are structured such that query by primary keys is enough for most cases.
Relational references and joins are not required. Dynamo is built on the ideas of consistent hashing,
object versioning, gossip-based membership protocol, merkle trees, and hinted handoff.
Dynamo supports simple get-and-put-based interface to the data store. Put requests include data
related to object version, which are stored in the context. Dynamo is built to incrementally scale as
the data grows. Thus, it relies on consistent hashing for effective partitioning.
Consistent Hashing
Consistent hashing forms an important principle for distributed hash tables. In consistent hashing,
addition or removal of a slot does not signifi cantly change the mapping of keys to the slots. To
appreciate this hashing scheme, let's fi rst look at an elementary hashing scheme and understand the
problems that show up as slots are added or removed.
A very rudimentary key allocation strategy among a set of nodes
could involve the use of modulo function. So, 50 keys can be
distributed among 7 nodes like so: key with value 85 goes to
node 1 because 85 modulo 7 is 1 and key with value 18 goes
to node 4 because 18 modulo 7 is 4, and so on for others. This
strategy works well until the number of nodes changes, that
is, newer ones get added or existing ones get removed. When
the number of nodes changes, the modulo function applied
to the existing keys produces a different output and leads to
rearrangement of the keys among the nodes. This isn't that
effective and that's when consistent hashing comes to the rescue.
A
1
2
3
8
4
D
B
7
In consistent hashing, the rearrangement of keys is not
majorly affected when nodes are added or removed. A good
way to explain consistent hashing is to draw out a circle and
mark the nodes on it as shown in Figure 4-15.
5
C
6
FIGURE 4-15
Now the keys themselves are assigned to the nodes that they are closest to. Which means in Figure
4-15, 1, 2, 3 get assigned to node A, 4 gets assigned to B, 5 and 6 get assigned to C, and 7 and 8 to
D. In order to set up such a scheme, you may create a large hash space, say all the SHA1 keys up to
a very large number, and map that onto a circle. Starting from 0 and going clockwise, you would
map all values to a maximum, at which point you would restart at 0. The nodes would also be
hashed and mapped on the same scheme.
Search WWH ::




Custom Search