Processing Streaming Data - Real-Time Analytics

Database Reference

In-Depth Information

ZooKeeper cluster being used to manage the data motion framework

without much risk.

Partitions and Merges

The core element of a stream processing system is some sort of

scatter-gather implementation.

An incoming stream is split among a number of identical processing

pipelines. Depending on the performance characteristics of each step in the

pipeline, the results of computation may be further subdivided to maintain

performance.

For example, if step two of the computation requires twice as much time

as step one, it would be necessary to at least double the number of work

units assigned to the second step in the pipeline compared to the first. In all

likelihood,morethandoubletheamountofunitswouldberequiredbecause

communications between work units introduces some overhead into the

process.

The data exchange between various components of the processing

application is usually a peer-to-peer affair. A component knows it must

produce data to be consumed, and most systems will establish a mechanism

for transferring data between processing units and controlling the amount

of parallelism at each stage of the computation.

Transactions

Transactional processing is a problem with real-time systems. Being

required to maintain a transaction puts significant overhead on the system.

If it is possible to operate without transactions, do so. If this is not possible,

some frameworks can provide a level of transactional support when coupled

with an appropriate data motion system.

The basic mechanism for transaction handling in real-time frameworks is

the rollback and retry. The framework begins by generating a checkpoint

of the input stream. This is easier to do in queuing or data motion systems

where each element has a unique identifier that increases monotonically

over the relevant time range. In Kafka, this is achieved by recording the

offset for each of the partitions in the topic(s) being used as input to the

real-time processing system.

Search WWH ::

Custom Search

Home