Database Reference
In-Depth Information
ZooKeeper cluster being used to manage the data motion framework
without much risk.
Partitions and Merges
The core element of a stream processing system is some sort of
scatter-gather implementation.
An incoming stream is split among a number of identical processing
pipelines. Depending on the performance characteristics of each step in the
pipeline, the results of computation may be further subdivided to maintain
performance.
For example, if step two of the computation requires twice as much time
as step one, it would be necessary to at least double the number of work
units assigned to the second step in the pipeline compared to the first. In all
likelihood,morethandoubletheamountofunitswouldberequiredbecause
communications between work units introduces some overhead into the
process.
The data exchange between various components of the processing
application is usually a peer-to-peer affair. A component knows it must
produce data to be consumed, and most systems will establish a mechanism
for transferring data between processing units and controlling the amount
of parallelism at each stage of the computation.
Transactions
Transactional processing is a problem with real-time systems. Being
required to maintain a transaction puts significant overhead on the system.
If it is possible to operate without transactions, do so. If this is not possible,
some frameworks can provide a level of transactional support when coupled
with an appropriate data motion system.
The basic mechanism for transaction handling in real-time frameworks is
the rollback and retry. The framework begins by generating a checkpoint
of the input stream. This is easier to do in queuing or data motion systems
where each element has a unique identifier that increases monotonically
over the relevant time range. In Kafka, this is achieved by recording the
offset for each of the partitions in the topic(s) being used as input to the
real-time processing system.
Search WWH ::




Custom Search