Spouts - Getting Started with Storm

Databases Reference

In-Depth Information

CHAPTER 4

Spouts

In this chapter, you'll take a look at the most commonly used strategies for designing

the entry point for a topology (a spout) and how to make spouts fault-tolerant.

Reliable versus Unreliable Messages

When designing a topology, one important thing to keep in mind is message reliability.

If a message can't be processed, you need to decide what to do with the individual

message and what to do with the topology as a whole. For example, when processing

bank deposits, it is important not to lose a single transaction message. But if you're

processing millions of tweets looking for some statistical metric, and one tweet gets

lost, you can assume that the metric will still be fairly accurate.

In Storm, it is the author's responsibility to guarantee message reliability according to

the needs of each topology. This involves a trade-off. A reliable topology must manage

lost messages, which requires more resources. A less reliable topology may lose some

messages, but is less resource-intensive. Whatever the chosen reliability strategy, Storm

provides the tools to implement it.

To manage reliability at the spout, you can include a message ID with the tuple at

emit time ( collector.emit(new Values(…),tupleId) ). The methods ack and fail are

called when a tuple is processed correctly or fails respectively. Tuple processing suc-

ceeds when the tuple is processed by all target bolts and all anchored bolts (you will

learn how to anchor a bolt to a tuple in the Chapter 5 ).

Tuple processing fails when:

• collector.fail(tuple) is called by the target spout

• processing time exceeds the configured timeout

Search WWH ::

Custom Search

Home