Database Reference
In-Depth Information
Setting up workers and parallelism to
enhance processing
Storm is a highly scalable, distributed, and fault tolerant real-time parallel processing com-
pute framework. Note that the emphasis is on scalability, distributed, and parallel process-
ing—well, we already know that Storm operates in clustered mode and is therefore distrib-
uted in its basic nature. Scalability was covered in the previous section; now, let's have a
closer look at parallelism. We introduced you to this concept in an earlier chapter, but now
we'll get you acquainted with how to tweak it to achieve the desired performance. The fol-
lowing points are the key criteria for this:
• A topology is allocated a certain number of workers at the time it's started.
• Each component in the topology (bolts and spouts) has a specified number of ex-
ecutors associated with it. These executors specify the number or degree of paral-
lelism for each running component of the topology.
• The whole efficiency and speed factor of Storm are driven by the parallelism fea-
ture of Storm, but we need to understand one thing: all the executors that attribute
to parallelism are running within the limited set of workers allocated to the topo-
logy. So, one needs to understand that increasing the parallelism would help
achieve efficiency only to a point, but beyond that the executors will struggle for
resource is the intention. Going beyond this increasing parallelism would not fetch
efficiency, but increasing the workers allocated to the topology would would make
computation efficient.
Another point to understand in terms of efficiency is network latency; we'll explore this in
the following sections.
Search WWH ::




Custom Search