Introducing Big Data Technologies - Data Warehousing in the Age of Big Data

Databases Reference

In-Depth Information

●

Replicated mode . This is the mode of deployment in production, on a cluster of machines called

an ensemble. Zookeeper achieves high availability through replication, and can provide a service

as long as a majority of the machines in the ensemble are up and running. For example, as seen

in Figure 4.12 , in a five-node ensemble, any two machines can fail and the service will still work

because a majority of three remain (a quorum), whereas in a six-node ensemble, a failure of three

means a loss of majority and shutdown of service. It is usual to have an odd number of machines

in an ensemble to avoid such situations.

Zookeeper has one task or goal, to ensure all the zNode changes across the system are updated to

the leader and followers. When a failure occurs in a minority of machines, the replicas need to bring

up the machines to catch up from the lag. To implement the management of the ensemble, Zookeeper

uses a protocol called Zab that runs in two steps and can be repetitive:

1. Leader election . The machines in an ensemble go through a process of electing a distinguished

member, called the leader. Clients communicate with one server in a session and work on a read

or write operation. As seen here, writes will be only accomplished through the leader, which

is then broadcast to the followers as an update. Reads can be from the leader or followers and

happen in memory. Followers sometimes lag in read operations and eventually become consistent.

This phase is finished once a majority (or quorum) of followers have synchronized their state with

the leader. Zab implements the following optimizations to circumvent the bottleneck of a leader:

● Clients can connect to any server, and servers have to serve read operations locally and

maintain information about the session of a client. This extra load of a follower process (a

process that is not a leader) makes the load more evenly distributed.

● The number of servers involved is small. This means that the network communication

overhead does not become the bottleneck that can affect fixed sequencer protocols.

2. Atomic broadcast . Write requests and updates are committed in a two-phase approach in Zab.

To maintain consistency across the ensemble, write requests are always communicated to the

leader. The leader broadcasts the update to all its followers. When a quorum of its followers

(in Figure 4.12 we need three followers) have persisted the change (phase 1 of a two-phase

commit), the leader commits the update (phase 2 of the commit), and the requestor gets a response

saying the update succeeded. The protocol for achieving consensus is designed to be atomic, so a

change either succeeds or fails completely.

Locks and processing

One of the biggest issues in distributed data processing is lock management, when one session has an

exclusive lock on a server. Zookeeper manages this process by creating a list of child nodes and lock

nodes and the associated queues of waiting processes for lock release. The lock node is allocated to

the next process that is waiting based on the order received.

Lock management is done through a set of watches. If you become overzealous and set a large

number of locks, it will become a nightmare and creates a herd effect on the Zookeeper service.

Typically, a watch is set on the preceding process that is currently holding a lock.

Failure and recovery

A common issue in coordinating a large number of processes is connection loss. When there is a

failover process, it needs information on the children affected by the connection loss to complete the

Search WWH ::

Custom Search

Home