Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

Figure 5.1

ZooKeeper

Adistributed Stormclusterrelies onZooKeeper forcoordinating thecluster.

Communication between nodes is peer-to-peer, so the load on the

ZooKeeper cluster is not very high as it is only used to manage metadata

for each of the nodes. If using a data motion system that also relies on

ZooKeeper, such as Kafka, it is fine to use that same cluster for hosting

Storm provided it has enough capacity.

ZooKeeper keeps track of all running topologies as well as the status of all

supervisors.

The Nimbus

The nimbus is the ringleader of the Storm circus. It is responsible for the

distribution of individual tasks for either spouts or bolts across the workers

in the cluster. It is also responsible for rebalancing a Storm cluster in the

event a supervisor has crashed.

At first glance, the nimbus would appear to be a single point of the failure,

like the JobTracker or NameNode in a traditional Hadoop cluster. For

Storm, this is not strictly the case. When the nimbus goes down, existing

topologies continue to function unimpeded. The Nimbus is not involved

in the moment-to-moment processing—just the distribution of processing

tasks to the supervisor nodes.

When the nimbus crashes, the cluster is no longer able to manage

topologies. This includes starting new topologies and rebalancing existing

topologies in the event of failure. It is recommended that nimbuses be

restarted as quickly as possible, but it will not immediately bring down a

cluster.

Search WWH ::

Custom Search

Home