Advance Concepts in Storm - Real-time Analytics with Storm and Cassandra

Database Reference

In-Depth Information

Aggregator

An Aggregator function is the most commonly used and versatile aggregator function.

It has the ability to emit one or more tuples, and each can have any number of fields. They

have the following interface signature:

public interface Aggregator<T> extends Operation {

T init(Object batchId, TridentCollector collector);

void aggregate(T state, TridentTuple tuple,

TridentCollector collector);

void complete(T state, TridentCollector collector);

}

The execution pattern is as follows:

• The init method is a predecessor to processing of every batch. It's called before

the processing of each batch. On completion, it returns an object holding the state

representation of the batch, and this is passed on to the subsequent aggregate and

complete methods.

• Unlike the init method, the aggregate method is called once for every tuple

in the batch partition. This method can store the state, and can emit the results de-

pending upon functionality requirements.

• The complete method is like a postprocessor; it's executed at the end, when the

batch partition has been completely processed by the aggregate.

The following is the implementation of the count as an aggregator function:

public class CountAggregate extends

BaseAggregator<CountState> {

static class CountState {

long count = 0;

}

public CountState init(Object batchId,

TridentCollector collector) {

return new CountState();

}

public void aggregate(CountState state, TridentTuple

tuple, TridentCollector collector) {

state.count+=1;

}

Search WWH ::

Custom Search

Home