Data-Flow Management in Streaming Analysis - Real-Time Analytics

Database Reference

In-Depth Information

/**

* Basic representation of a data object in Flume.

* Provides access to data as it flows through the

system.

*/

public interface Event {

public Map<String, String> getHeaders();

public void setHeaders(Map<String, String> headers);

public byte [] getBody();

public void setBody( byte [] body);

}

Like the Kafka Message , the Event payload is an opaque byte array called

the Body . The Event also allows for the introduction of headers, much

like an HTTP call. These headers provide a place for the introduction of

metadata that is not part of the message itself.

Headers are usually introduced by the Source to add some metadata

associated with the opaque event. Examples might include the host name of

the machine that generated the event or a digest signature of the event data.

Many Interceptors modify the header metadata as well. The metadata

components are used by Flume to help direct events to their appropriate

destination. For example, the multiplexing selector uses the header

structure to determine the destination channel for an event. How each

componentmodifiesorusesthemetadatadetailsisdiscussedindetailinthe

sections devoted to each selector.

Channel Selectors

If multiple channels are to be used by a source, as is the case with the

agent_1 source in the previous example, some policy must be used to

distribute data across the different channels. Flume's default policy is to use

a replicating selector to control the distribution, but it also has a built-in

multiplexing selector that is used to load balance or partition inputs to

multiple sinks. It is also possible to implement custom channel selection

behavior.

Search WWH ::

Custom Search

Home