Database Reference
In-Depth Information
Aggregator
An Aggregator function is the most commonly used and versatile aggregator function.
It has the ability to emit one or more tuples, and each can have any number of fields. They
have the following interface signature:
public interface Aggregator<T> extends Operation {
T init(Object batchId, TridentCollector collector);
void aggregate(T state, TridentTuple tuple,
TridentCollector collector);
void complete(T state, TridentCollector collector);
}
The execution pattern is as follows:
• The init method is a predecessor to processing of every batch. It's called before
the processing of each batch. On completion, it returns an object holding the state
representation of the batch, and this is passed on to the subsequent aggregate and
complete methods.
• Unlike the init method, the aggregate method is called once for every tuple
in the batch partition. This method can store the state, and can emit the results de-
pending upon functionality requirements.
• The complete method is like a postprocessor; it's executed at the end, when the
batch partition has been completely processed by the aggregate.
The following is the implementation of the count as an aggregator function:
public class CountAggregate extends
BaseAggregator<CountState> {
static class CountState {
long count = 0;
}
public CountState init(Object batchId,
TridentCollector collector) {
return new CountState();
}
public void aggregate(CountState state, TridentTuple
tuple, TridentCollector collector) {
state.count+=1;
}
Search WWH ::




Custom Search