Databases Reference
In-Depth Information
Replicated mode . This is the mode of deployment in production, on a cluster of machines called
an ensemble. Zookeeper achieves high availability through replication, and can provide a service
as long as a majority of the machines in the ensemble are up and running. For example, as seen
in Figure 4.12 , in a five-node ensemble, any two machines can fail and the service will still work
because a majority of three remain (a quorum), whereas in a six-node ensemble, a failure of three
means a loss of majority and shutdown of service. It is usual to have an odd number of machines
in an ensemble to avoid such situations.
Zookeeper has one task or goal, to ensure all the zNode changes across the system are updated to
the leader and followers. When a failure occurs in a minority of machines, the replicas need to bring
up the machines to catch up from the lag. To implement the management of the ensemble, Zookeeper
uses a protocol called Zab that runs in two steps and can be repetitive:
1. Leader election . The machines in an ensemble go through a process of electing a distinguished
member, called the leader. Clients communicate with one server in a session and work on a read
or write operation. As seen here, writes will be only accomplished through the leader, which
is then broadcast to the followers as an update. Reads can be from the leader or followers and
happen in memory. Followers sometimes lag in read operations and eventually become consistent.
This phase is finished once a majority (or quorum) of followers have synchronized their state with
the leader. Zab implements the following optimizations to circumvent the bottleneck of a leader:
Clients can connect to any server, and servers have to serve read operations locally and
maintain information about the session of a client. This extra load of a follower process (a
process that is not a leader) makes the load more evenly distributed.
The number of servers involved is small. This means that the network communication
overhead does not become the bottleneck that can affect fixed sequencer protocols.
2. Atomic broadcast . Write requests and updates are committed in a two-phase approach in Zab.
To maintain consistency across the ensemble, write requests are always communicated to the
leader. The leader broadcasts the update to all its followers. When a quorum of its followers
(in Figure 4.12 we need three followers) have persisted the change (phase 1 of a two-phase
commit), the leader commits the update (phase 2 of the commit), and the requestor gets a response
saying the update succeeded. The protocol for achieving consensus is designed to be atomic, so a
change either succeeds or fails completely.
Locks and processing
One of the biggest issues in distributed data processing is lock management, when one session has an
exclusive lock on a server. Zookeeper manages this process by creating a list of child nodes and lock
nodes and the associated queues of waiting processes for lock release. The lock node is allocated to
the next process that is waiting based on the order received.
Lock management is done through a set of watches. If you become overzealous and set a large
number of locks, it will become a nightmare and creates a herd effect on the Zookeeper service.
Typically, a watch is set on the preceding process that is currently holding a lock.
Failure and recovery
A common issue in coordinating a large number of processes is connection loss. When there is a
failover process, it needs information on the children affected by the connection loss to complete the
Search WWH ::




Custom Search