Database Reference
In-Depth Information
Spouts
A spout is the collection funnel of a topology; it feeds events or tuples into the topology. It
can be considered as the input source to the Storm processing unit—the topology.
The spout reads messages from external sources such as a queue, file, port, and so on. Also,
the spout emits them into the stream, which in turn passes them to the bolts. It's the task of
the Storm spout to track each event or tuple throughout its processing through the Directed
Acyclic Graph ( DAG ). The Storm framework then sends and generates either acknow-
ledgement or failure notifications based on the outcome of the execution of tuples in the to-
pology. This mechanism gives the guaranteed processing feature to Storm. Based on the re-
quired functionality, spouts can be programmed or configured to be reliable or unreliable.
A reliable spout plays back the failed events into the topology.
The following diagram depicts the same flow, graphically:
All Storm spouts are implemented to be able to emit tuples on one or more stream bolts. As
in the preceding diagram, a spout can emit tuples to both bolt A and C .
Each spout should implement the IRichSpout interface. The following are important meth-
ods to know in context with spout:
nextTuple() : This is the method that keeps on polling the external source for
new events; for instance, the queue in the preceding example. On every poll, if the
method finds an event, it is emitted to the topology through a stream, and if there is
no new event, the method simply returns.
ack() : This method is called when the tuple emitted by the spout has been suc-
cessfully processed by the topology.
fail() : This method is called when a tuple emitted by the spout is not success-
fully processed within the specified timeout. In this case, for reliable spouts, the
spout traces and tracks each tuple with the messageIds event, which are then re-
Search WWH ::




Custom Search