Databases Reference
In-Depth Information
CHAPTER 1
Basics
Storm is a distributed, reliable, fault-tolerant system for processing streams of data.
The work is delegated to different types of components that are each responsible for a
simple specific processing task. The input stream of a Storm cluster is handled by a
component called a spout . The spout passes the data to a component called a bolt ,
which transforms it in some way. A bolt either persists the data in some sort of storage,
or passes it to some other bolt. You can imagine a Storm cluster as a chain of bolt
components that each make some kind of transformation on the data exposed by the
spout.
To illustrate this concept, here's a simple example. Last night I was watching the news
when the announcers started talking about politicians and their positions on various
topics. They kept repeating different names, and I wondered if each name was men-
tioned an equal number of times, or if there was a bias in the number of mentions.
Imagine the subtitles of what the announcers were saying as your input stream of data.
You could have a spout that reads this input from a file (or a socket, via HTTP, or some
other method). As lines of text arrive, the spout hands them to a bolt that separates
lines of text into words. This stream of words is passed to another bolt that compares
each word to a predefined list of politician's names. With each match, the second bolt
increases a counter for that name in a database. Whenever you want to see the results,
you just query that database, which is updated in real time as data arrives. The ar-
rangement of all the components (spouts and bolts) and their connections is called a
topology (see Figure 1-1 ).
 
Search WWH ::




Custom Search