Database Reference
In-Depth Information
Timed Counting and Summation
The simplest form of counting and summation is the timed counter. In this
case, any timestamp information associated with the event is ignored, and
events are simply processed in the order that they arrive. This works best
with collection mechanisms that do not make an attempt to reliably deliver
data, meaning it is mostly applicable to monitoring applications.
These sorts of counters are also well suited to Lambda Architectures. In
this architecture, there is a second process that will correct for any data
loss or overcounting in the stream-processing environment. Under normal
circumstances, the events coming out of something like Kafka may be
disordered, but it's usually not enough to make a big difference. It is also
relatively rare for events to be duplicated by the need to reread data.
In both cases, the basic counting systems described in the following sections
can be used. They are easy to implement and require little overhead.
However, they are not suitable for systems that need to be highly accurate
or require idempotent operation.
Counting in Bolts
When Storm was first released, it did not contain any primitive operations.
Any sort of counting operation was implemented by writing an IRichBolt
that performed the counting task. To produce output over time, a
background thread was typically used to emit counts on a fixed basis and
then reset the local memory map. In newer versions of Storm (all versions
beyond 0.8.0) the background thread is no longer needed as Storm can
produce ticker events.
Implementing a basic aggregation bolt begins like any other bolt
implementation. In this case, the bolt only takes a single parameter, which
is the number of seconds to wait before emitting counts to the next bolt:
public class EventCounterBolt implements IRichBolt {
int updates= 10;
public EventCounterBolt updateSeconds( int updates) {
this .updates = updates; return this ;
}
public int updateSeconds() { return updates; }
Search WWH ::




Custom Search