Database Reference
In-Depth Information
Chapter 8
Exact Aggregation and Delivery
Now that the infrastructure is in place to collect, process, store, and deliver
streaming data, the time has come to put it all together. The core of most
applications is the aggregation of data coming through the stream, which is
the subject of this chapter.
The place to begin is basic aggregation—counting and summation of various
elements. This basic task has varying levels of native support in the popular
stream processing frameworks. Probably the most advanced is the support
provided by the Trident language portion of Storm, but that assumes a
willingness to work within Trident's assumptions. Samza has some support
for internal aggregation using its KeyValueStore interfaces, but it leaves
external aggregation and delivery to the user to implement. Basic Storm
topologies, of course, have no primitives to speak of and require the user to
implement all aspects of the topology. This chapter covers aggregation in all
three frameworks using a common example.
Most real-time analysis is somehow related to time-series analysis. This
chapter also introduces a method for performing multi-resolution
aggregation of time series for later delivery. Additionally, because the data
are being transported via mechanisms that can potentially reorder events,
this multi-resolution approach relies on record-level timestamps rather than
simply generating results on a fixed interval.
Finally, after data has been collected and aggregated into a time series, it
must be delivered to the client. Using the techniques introduced in the last
chapter, time-series data can be delivered to the client for rendering. There
are a variety of rendering options for time-series data, but the most popular
is the simple strip chart because it's easy to understand. An alternative to the
strip chart is the horizon chart, which allows for many correlated time series
to be displayed simultaneously. The standard rendering techniques used can
sometimes be too slow when the data delivery rate is high. At the expense of
flexibility, high-performance techniques can be used to provide charts with
very high update rates.
Search WWH ::




Custom Search