Flume - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

% flume-ng agent --conf-file spool-to-hdfs-tiered.properties --name

agent2 ...

Delivery Guarantees

Flume uses transactions to ensure that each batch of events is reliably delivered from a

source to a channel, and from a channel to a sink. In the context of the Avro sink-source

connection, transactions ensure that events are reliably delivered from one agent to the

next.

The operation to read a batch of events from the file channel in agent1 by the Avro sink

will be wrapped in a transaction. The transaction will only be committed once the Avro

sink has received the (synchronous) confirmation that the write to the Avro source's RPC

endpoint was successful. This confirmation will only be sent once agent2 's transaction

wrapping the operation to write the batch of events to its file channel has been success-

fully committed. Thus, the Avro sink-source pair guarantees that an event is delivered

from one Flume agent's channel to another Flume agent's channel (at least once).

If either agent is not running, then clearly events cannot be delivered to HDFS. For ex-

ample, if agent1 stops running, then files will accumulate in the spooling directory, to

be processed once agent1 starts up again. Also, any events in an agent's own file chan-

nel at the point the agent stopped running will be available on restart, due to the durability

guarantee that file channel provides.

If agent2 stops running, then events will be stored in agent1 's file channel until

agent2 starts again. Note, however, that channels necessarily have a limited capacity; if

agent1 's channel fills up while agent2 is not running, then any new events will be

lost. By default, a file channel will not recover more than one million events (this can be

overridden by its capacity property), and it will stop accepting events if the free disk

space for its checkpoint directory falls below 500 MB (controlled by the minimumRe-

quiredSpace property).

Both these scenarios assume that the agent will eventually recover, but that is not always

the case (if the hardware it is running on fails, for example). If agent1 doesn't recover,

then the loss is limited to the events in its file channel that had not been delivered to

agent2 before agent1 shut down. In the architecture described here, there are multiple

first-tier agents like agent1 , so other nodes in the tier can take over the function of the

failed node. For example, if the nodes are running load-balanced web servers, then other

nodes will absorb the failed web server's traffic, and they will generate new Flume events

that are delivered to agent2 . Thus, no new events are lost.

Search WWH ::

Custom Search

Home