Databases Reference
In-Depth Information
CHAPTER 4
Spouts
In this chapter, you'll take a look at the most commonly used strategies for designing
the entry point for a topology (a spout) and how to make spouts fault-tolerant.
Reliable versus Unreliable Messages
When designing a topology, one important thing to keep in mind is message reliability.
If a message can't be processed, you need to decide what to do with the individual
message and what to do with the topology as a whole. For example, when processing
bank deposits, it is important not to lose a single transaction message. But if you're
processing millions of tweets looking for some statistical metric, and one tweet gets
lost, you can assume that the metric will still be fairly accurate.
In Storm, it is the author's responsibility to guarantee message reliability according to
the needs of each topology. This involves a trade-off. A reliable topology must manage
lost messages, which requires more resources. A less reliable topology may lose some
messages, but is less resource-intensive. Whatever the chosen reliability strategy, Storm
provides the tools to implement it.
To manage reliability at the spout, you can include a message ID with the tuple at
emit time ( collector.emit(new Values(…),tupleId) ). The methods ack and fail are
called when a tuple is processed correctly or fails respectively. Tuple processing suc-
ceeds when the tuple is processed by all target bolts and all anchored bolts (you will
learn how to anchor a bolt to a tuple in the Chapter 5 ).
Tuple processing fails when:
collector.fail(tuple) is called by the target spout
• processing time exceeds the configured timeout
 
Search WWH ::




Custom Search