Database Reference
In-Depth Information
DOES ZOOKEEPER USE PAXOS?
No. ZooKeeper's Zab protocol is not the same as the well-known Paxos algorithm.
[
145
]
Zab is similar,
but it differs in several aspects of its operation, such as relying on TCP for its message ordering guaran-
tees.
[
146
]
If the leader fails, the remaining machines hold another leader election and continue as be-
fore with the new leader. If the old leader later recovers, it then starts as a follower. Leader
election is very fast, around 200 ms according to
one published result
,
so performance
does not noticeably degrade during an election.
All machines in the ensemble write updates to disk before updating their in-memory cop-
ies of the znode tree. Read requests may be serviced from any machine, and because they
involve only a lookup from memory, they are very fast.
Consistency
Understanding the basis of ZooKeeper's implementation helps in understanding the con-
sistency guarantees that the service makes. The terms “leader” and “follower” for the ma-
chines in an ensemble are apt because they make the point that a follower may lag the
leader by a number of updates. This is a consequence of the fact that only a majority and
not all members of the ensemble need to have persisted a change before it is committed. A
good mental model for ZooKeeper is of clients connected to ZooKeeper servers that are
following the leader. A client may actually be connected to the leader, but it has no control