Information Technology Reference
In-Depth Information
3.2.1 Chord
Chord [Stoica et al., 2001a] is among the first in the pioneering efforts
in the design and implementation of robust P2P network architecture for
data storage and lookup. Chord is an ingenious improvement of the consistent
hashing [Karger et al., 1997] idea, which already has at least a couple of nice
features. First, with high probability the hashing action provides a balanced
load over the network in the sense that each node receives more or less the
same amount of data items for storage. Second, with high probability when
the N th node joins (or leaves), only an O(1/N ) fraction of data items need
to be moved to a different node in the network. However, it originally has a
requirement that every node has to know about any other node in the network.
If this requirement is removed, to look up an item, potentially (in the worst
case) all N nodes in the system have to be checked. This is obviously a big
obstacle for a higher scalability. Chord improves this by requiring each node
to store only O(log N ) information about other nodes, and consequently, a
lookup requires only O(log N ) messages. A join or leave event also entails
only about O(log 2 N ) messages.
In consistent hashing, a hash function, e.g., SHA-1, is used for hashing
an input numeric to an m-bit identifier. The input numeric can be the IP
address of a node, the contents of a file, etc. The only requirement about
the parameter m is that it must be large enough so that the hashed outputs
have negligible probability of being equal. Now, as everything is turned into
an m-bit identifier, we can consider an identifier space organized as a ring—a
circular list of numbers from 0 to 2 m −1 (so that we have to use modulo
arithmetic with base m). For any data item (e.g., a file) with an identifier
k (i.e., its hashed output is k), it is mapped to a node (i.e., a participating
machine) with identifier equal to k or the first node with identifier following
k (notice that in practice, definitely not every identifier in the ring will have a
machine participating currently). Such a node is then defined as successor(k).
Figure 3.1 shows an example with m = 3 [Stoica et al., 2001a] and currently
there are three nodes participating (i.e., nodes 0, 1, and 3). Now, if we need to
store three data items with identifiers 1, 2, and 6, then we need to store data
item 1 to node 1, data item 2 to node 3, and data item 6 to node 0, according
to the above definition of successor relationship.
As a minimum requirement, each node only needs to know (i.e., keep a
piece of state information such as IP address, etc.) its successor node in the
ring in order to accomplish the process of data lookup. However, as mentioned
above, in the worst case a lookup may traverse all the N nodes currently in
the network. Here is how Chord ingeniously improves this aspect. Specifically,
each node maintains a small routing table, called finger table, which contains
at most m entries (i.e., O(log N ) entries). The i-th entry (or the i-th finger)
in the finger table of node n records the identifier of the first identifier s (not
necessarily corresponds to an existing machine) that succeeds n by at least
2 i−1
hops on the Chord ring, i.e., we have s = successor(n + 2 i−1 ), where
Search WWH ::




Custom Search