Data-Flow Management in Streaming Analysis - Real-Time Analytics

Database Reference

In-Depth Information

considered to be mutually exclusive. There is no reason that the two cannot

both be used in a single environment according to need.

Distributed Data Flows

A distributed data flow system has two fundamental properties that should

be addressed. The first is an “at least once” delivery semantic. The second

is solving the “n+1” delivery problem. Without these, a distributed data

flow will have difficulty successfully scaling. This section covers these two

components and why they are so important to a distributed data flow.

At Least Once Delivery

There are three options for data delivery and processing in any sort of data

collection framework:

• At most once delivery

• At least once delivery

• Exactly once delivery

Many processing frameworks, particularly those used for system

monitoring, provide “at most once” delivery and processing semantics.

Largely, this is because the situations they were designed to handle do

not require all the data be transmitted, but they do require maximum

performance to alert administrators to problems. In fact, many of these

systems down-sample the data to further improve performance. As long as

the rate of data loss is approximately known, the monitoring software can

recover a usable value during processing.

In other systems—for instance financials systems or advertising systems

where logs are used to determine fees—every lost data record means lost

revenue. Furthermore, audit requirements often mean that this data loss

cannot be estimated through techniques used in the monitoring space. In

this case, most implementations turn to “exactly once” delivery through

queuing systems. Popular examples include the Apache project's ActiveMQ

queuing system, as well as RabbitMQ, along with innumerable commercial

solutions. These servers usually implement their queue semantics on the

server side, primarily because they are usually designed to support a variety

of producers and consumers in an Enterprise setting.

Search WWH ::

Custom Search

Home