Cloudera's Distribution Including Apache Hadoop - Cloudera Administration

Database Reference

In-Depth Information

making ZooKeeper itself a distributed application. A client to a ZooKeeper service is the

nodes in a cluster. All ZooKeeper information runs in the memory, making it really fast. A

copy of the in-memory representation is also maintained on the disk of the server.

The following diagram shows the high-level workings of the ZooKeeper service:

In the preceding diagram, you see a ZooKeeper service with five servers. There is one

server that is a leader and four others that are followers. Each client (in a Hadoop cluster,

each node in the cluster is a client) connects to exactly one server in the ensemble to read

information. The leader is responsible for performing write operations in ZooKeeper. All

servers need to know about the other servers in the ensemble.

Once the leader updates the znode with the write operation, the information is propagated

to the followers. If the leader server fails, one of the followers becomes a leader and the

rest remain followers.

The concept of ZooKeeper will be clearer when we see how Apache Hadoop uses

ZooKeeper for namenode high availability. This will be covered in Chapter 4 , Exploring

HDFS Federation and Its High Availability .

Search WWH ::

Custom Search

Home