Database Reference
In-Depth Information
Fault-tolerant : If there are faults during execution of the computation, the
system will reassign tasks as necessary.
Programming language agnostic : Storm tasks and processing components can
be defined in any language, making Storm accessible to nearly anyone. Clojure,
Java, Ruby, and Python are supported by default. Support for other languages
can be added by implementing a simple Storm communication protocol.
The core abstraction in Storm is the stream . A stream is an unbounded sequence
of tuples . Storm provides the primitives for transforming a stream into a new stream
in a distributed and reliable way. The basic primitives Storm provides for performing
stream transformations are spouts and bolts . A spout is a source of streams. A bolt con-
sumes any number of input streams, carries out some processing, and possibly emits
new streams. Complex stream transformations, such as the computation of a stream
of trending topics from a stream of tweets, require multiple steps and thus multiple
bolts. A topology is a graph of stream transformations where each node is a spout or
bolt. Edges in the graph indicate which bolts are subscribing to which streams. When a
spout or bolt emits a tuple to a stream, it sends the tuple to every bolt that subscribed to
that stream. Links between nodes in a topology indicate how tuples should be passed
around. Each node in a Storm topology executes in parallel. In any topology, we can
specify how much parallelism is required for each node, and then Storm will spawn
that number of threads across the cluster to perform the execution.
Figure 12.6 depicts a sample Storm topology.
The Storm system relies on the notion of stream grouping to specify how tuples
are sent between processing components. In other words, it defines how that stream
should be partitioned among the bolt's tasks. In particular, Storm supports different
types of stream groupings such as
1. Shuffle grouping , where stream tuples are randomly distributed such that
each bolt is guaranteed to get an equal number of tuples
2. Fields grouping , where the tuples are partitioned by the fields specified in
the grouping
3. All grouping , where the stream tuples are replicated across all the bolts
4. Global grouping , where the entire stream goes to a single bolt
Bolt
Bolt
Spout
Bolt
Spout
Bolt
FIGURE 12.6
Sample Storm topology.
Search WWH ::




Custom Search