Database Reference
In-Depth Information
The pseudocode for lock acquisition is as follows:
1. Create an ephemeral sequential znode named lock- under the lock znode, and re-
member its actual pathname (the return value of the create operation).
2. Get the children of the lock znode and set a watch.
3. If the pathname of the znode created in step 1 has the lowest number of the chil-
dren returned in step 2, then the lock has been acquired. Exit.
4. Wait for the notification from the watch set in step 2, and go to step 2.
The herd effect
Although this algorithm is correct, there are some problems with it. The first problem is
that this implementation suffers from the herd effect . Consider hundreds or thousands of
clients, all trying to acquire the lock. Each client places a watch on the lock znode for
changes in its set of children. Every time the lock is released or another process starts the
lock acquisition process, the watch fires, and every client receives a notification. The
“herd effect” refers to a large number of clients being notified of the same event when
only a small number of them can actually proceed. In this case, only one client will suc-
cessfully acquire the lock, and the process of maintaining and sending watch events to all
clients causes traffic spikes, which put pressure on the ZooKeeper servers.
To avoid the herd effect, the condition for notification needs to be refined. The key obser-
vation for implementing locks is that a client needs to be notified only when the child
znode with the previous sequence number goes away, not when any child znode is deleted
(or created). In our example, if clients have created the znodes /leader/lock-1 , /leader/
lock-2 , and /leader/lock-3 , then the client holding /leader/lock-3 needs to be notified only
when /leader/lock-2 disappears. It does not need to be notified when /leader/lock-1 disap-
pears or when a new znode, /leader/lock-4 , is added.
Recoverable exceptions
Another problem with the lock algorithm as it stands is that it doesn't handle the case
when the create operation fails due to connection loss. Recall that in this case we do not
know whether the operation succeeded or failed. Creating a sequential znode is a
nonidempotent operation, so we can't simply retry, because if the first create had suc-
ceeded we would have an orphaned znode that would never be deleted (until the client
session ended, at least). Deadlock would be the unfortunate result.
The problem is that after reconnecting, the client can't tell whether it created any of the
child znodes. By embedding an identifier in the znode name, if it suffers a connection
loss, it can check to see whether any of the children of the lock node have its identifier in
Search WWH ::




Custom Search