Databases Reference
In-Depth Information
CHAPTER 8
Transactional Topologies
With Storm, you can guarantee message processing by using an ack and fail strategy,
as mentioned earlier in the topic. But what happens if tuples are replayed? How do you
make sure you won't overcount?
Transactional Topologies is a new feature, included in Storm 0.7.0, that enables mes-
saging semantics to ensure you replay tuples in a secure way and process them only
once. Without support for transactional topologies, you wouldn't be able to count in
a fully accurate, scalable, and fault-tolerant way.
Transactional Topologies are an abstraction built on top of standard
Storm spouts and bolts.
The Design
In a transactional topology, Storm uses a mix of parallel and sequential tuple process-
ing. The spout generates batches of tuples that are processed by the bolts in parallel.
Some of those bolts are known as committers , and they commit processed batches in a
strictly ordered fashion. This means that if you have two batches with five tuples each,
both tuples will be processed in parallel by the bolts, but the committer bolts won't
commit the second tuple until the first tuple is committed successfully.
When dealing with transactional topologies, it is important to be able
to replay batch of tuples from the source, and sometimes even several
times. So make sure your source of data—the one that your spout will
be connected to—has the ability to do that.
 
Search WWH ::




Custom Search