Database Reference
In-Depth Information
Now my spout reads and emits the stream into the field labeled as sentence . In the next
line, we will split the sentence into words; that's the very same functionality that we de-
ployed in our earlier reference to the wordCount topology.
The following is the code context capturing the working of the split functionality:
public class Split extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector
collector) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
collector.emit(new Values(word));
}
}
}
A very simple context splits the sentence on white space to emit each word as a tuple.
Now the topology beyond this point computes the count and stores the results in a persist-
ent manner. The topology can be computed by using the following steps:
1. We group the stream by the word field.
2. We aggregate and persist each group using the count aggregator.
The persistent function should be written in a fashion to store the results of aggregation in
a store that's actually persisting the state. The illustration in the preceding code keeps all
the aggregates in memory, this snippet can be very conveniently rewritten to persist the
values to IMDB in memory database systems such as memcached or Hazelcast, or stable
storage such as Cassandra and so on.
Trident with Storm is so popular because it guarantees the processing of all tuples in a
fail-safe manner in exactly one semantic. In situations where retry is necessary because of
failures, it does that exactly once and once only, so as a developer I don't end up updating
the table storage multiple times on occurrence of a failure.
Trident works on micro-batching by creating very small batches on incoming streams, as
shown in the following figure:
Search WWH ::




Custom Search