Database Reference
In-Depth Information
This creates a spout named “input” and returns a
SpoutDeclarer
that is
used to further configure the
Spout
. Most topologies only contain a single
IRichSpout
type, but Storm often creates more than one
Spout
task to
improve the parallelism and durability of the input process. The maximum
number of tasks and other related variables are all set using the returned
SpoutDeclarer
.
Bolts are Storm's basic unit of computation. Like spouts, they must have a
unique name, and they are added via the
setBolt
method. For example,
this adds a new bolt named “processing” to the topology:
builder.setBolt("processing",
new
BasicBolt())
.shuffleGrouping("input");
The
setBolt
method returns a
BoltDeclarer
object that is used to
connectthebolttotherestofthetopology.Boltsareattachedtothetopology
by using one of the grouping methods to define an input vertex. In this case
the “input” spout vertex defined earlier is used.
Storm creates several tasks for each defined
Bolt
to improve parallelism
and durability. Because this may affect computations like aggregation, the
grouping methods define how data is sent to each of the individual tasks.
This is similar to how partition functions are used in map-reduce
implementations, but with a bit more flexibility. Storm offers a number of
grouping methods to organize the flow of data in the topology.
Shuffle Groupings
In the earlier example, the
shuffleGrouping
method is used to distribute
data among the
Bolt
tasks. This method randomly shuffles each data
element, called a
Tuple
, among the
Bolt
tasks such that each task gets
roughly the same amount of data.
Field Groupings
Anothercommongroupingisthe
fieldsGrouping
method.Thisgrouping
uses one or more of the named elements of a tuple, which are defined by
the input vertex, to determine the task that will receive a particular data
element. This is essentially equivalent to a SQL
GROUP BY
clause and
is often used when implementing aggregation bolts. To group data from
“input” by
key1
and
key2
, add the bolt as follows: