P2P Network Architectures - Peer-to-Peer Computing: Applications, Architecture, Protocols, and Challenges

Information Technology Reference

In-Depth Information

3.2.1 Chord

Chord [Stoica et al., 2001a] is among the first in the pioneering efforts

in the design and implementation of robust P2P network architecture for

data storage and lookup. Chord is an ingenious improvement of the consistent

hashing [Karger et al., 1997] idea, which already has at least a couple of nice

features. First, with high probability the hashing action provides a balanced

load over the network in the sense that each node receives more or less the

same amount of data items for storage. Second, with high probability when

the N th node joins (or leaves), only an O(1/N ) fraction of data items need

to be moved to a different node in the network. However, it originally has a

requirement that every node has to know about any other node in the network.

If this requirement is removed, to look up an item, potentially (in the worst

case) all N nodes in the system have to be checked. This is obviously a big

obstacle for a higher scalability. Chord improves this by requiring each node

to store only O(log N ) information about other nodes, and consequently, a

lookup requires only O(log N ) messages. A join or leave event also entails

only about O(log 2 N ) messages.

In consistent hashing, a hash function, e.g., SHA-1, is used for hashing

an input numeric to an m-bit identifier. The input numeric can be the IP

address of a node, the contents of a file, etc. The only requirement about

the parameter m is that it must be large enough so that the hashed outputs

have negligible probability of being equal. Now, as everything is turned into

an m-bit identifier, we can consider an identifier space organized as a ring—a

circular list of numbers from 0 to 2 m −1 (so that we have to use modulo

arithmetic with base m). For any data item (e.g., a file) with an identifier

k (i.e., its hashed output is k), it is mapped to a node (i.e., a participating

machine) with identifier equal to k or the first node with identifier following

k (notice that in practice, definitely not every identifier in the ring will have a

machine participating currently). Such a node is then defined as successor(k).

Figure 3.1 shows an example with m = 3 [Stoica et al., 2001a] and currently

there are three nodes participating (i.e., nodes 0, 1, and 3). Now, if we need to

store three data items with identifiers 1, 2, and 6, then we need to store data

item 1 to node 1, data item 2 to node 3, and data item 6 to node 0, according

to the above definition of successor relationship.

As a minimum requirement, each node only needs to know (i.e., keep a

piece of state information such as IP address, etc.) its successor node in the

ring in order to accomplish the process of data lookup. However, as mentioned

above, in the worst case a lookup may traverse all the N nodes currently in

the network. Here is how Chord ingeniously improves this aspect. Specifically,

each node maintains a small routing table, called finger table, which contains

at most m entries (i.e., O(log N ) entries). The i-th entry (or the i-th finger)

in the finger table of node n records the identifier of the first identifier s (not

necessarily corresponds to an existing machine) that succeeds n by at least

2 i−1

hops on the Chord ring, i.e., we have s = successor(n + 2 i−1 ), where

Peer-to-Peer Computing: Applications, Architecture, Protocols, and Challenges

Search WWH ::

Custom Search

Home