ZooKeeper - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

The pseudocode for lock acquisition is as follows:

1. Create an ephemeral sequential znode named lock- under the lock znode, and re-

member its actual pathname (the return value of the create operation).

2. Get the children of the lock znode and set a watch.

3. If the pathname of the znode created in step 1 has the lowest number of the chil-

dren returned in step 2, then the lock has been acquired. Exit.

4. Wait for the notification from the watch set in step 2, and go to step 2.

The herd effect

Although this algorithm is correct, there are some problems with it. The first problem is

that this implementation suffers from the herd effect . Consider hundreds or thousands of

clients, all trying to acquire the lock. Each client places a watch on the lock znode for

changes in its set of children. Every time the lock is released or another process starts the

lock acquisition process, the watch fires, and every client receives a notification. The

“herd effect” refers to a large number of clients being notified of the same event when

only a small number of them can actually proceed. In this case, only one client will suc-

cessfully acquire the lock, and the process of maintaining and sending watch events to all

clients causes traffic spikes, which put pressure on the ZooKeeper servers.

To avoid the herd effect, the condition for notification needs to be refined. The key obser-

vation for implementing locks is that a client needs to be notified only when the child

znode with the previous sequence number goes away, not when any child znode is deleted

(or created). In our example, if clients have created the znodes /leader/lock-1 , /leader/

lock-2 , and /leader/lock-3 , then the client holding /leader/lock-3 needs to be notified only

when /leader/lock-2 disappears. It does not need to be notified when /leader/lock-1 disap-

pears or when a new znode, /leader/lock-4 , is added.

Recoverable exceptions

Another problem with the lock algorithm as it stands is that it doesn't handle the case

when the create operation fails due to connection loss. Recall that in this case we do not

know whether the operation succeeded or failed. Creating a sequential znode is a

nonidempotent operation, so we can't simply retry, because if the first create had suc-

ceeded we would have an orphaned znode that would never be deleted (until the client

session ended, at least). Deadlock would be the unfortunate result.

The problem is that after reconnecting, the client can't tell whether it created any of the

child znodes. By embedding an identifier in the znode name, if it suffers a connection

loss, it can check to see whether any of the children of the lock node have its identifier in

Search WWH ::

Custom Search

Home