Advance Concepts in Storm - Real-time Analytics with Storm and Cassandra

Database Reference

In-Depth Information

Now my spout reads and emits the stream into the field labeled as sentence . In the next

line, we will split the sentence into words; that's the very same functionality that we de-

ployed in our earlier reference to the wordCount topology.

The following is the code context capturing the working of the split functionality:

public class Split extends BaseFunction {

public void execute(TridentTuple tuple, TridentCollector

collector) {

String sentence = tuple.getString(0);

for(String word: sentence.split(" ")) {

collector.emit(new Values(word));

}

A very simple context splits the sentence on white space to emit each word as a tuple.

Now the topology beyond this point computes the count and stores the results in a persist-

ent manner. The topology can be computed by using the following steps:

1. We group the stream by the word field.

2. We aggregate and persist each group using the count aggregator.

The persistent function should be written in a fashion to store the results of aggregation in

a store that's actually persisting the state. The illustration in the preceding code keeps all

the aggregates in memory, this snippet can be very conveniently rewritten to persist the

values to IMDB in memory database systems such as memcached or Hazelcast, or stable

storage such as Cassandra and so on.

Trident with Storm is so popular because it guarantees the processing of all tuples in a

fail-safe manner in exactly one semantic. In situations where retry is necessary because of

failures, it does that exactly once and once only, so as a developer I don't end up updating

the table storage multiple times on occurrence of a failure.

Trident works on micro-batching by creating very small batches on incoming streams, as

shown in the following figure:

Search WWH ::

Custom Search

Home