Database Reference
In-Depth Information
% flume-ng agent --conf-file spool-to-hdfs-tiered.properties --name
agent2 ...
Delivery Guarantees
Flume uses transactions to ensure that each batch of events is reliably delivered from a
source to a channel, and from a channel to a sink. In the context of the Avro sink-source
connection, transactions ensure that events are reliably delivered from one agent to the
next.
The operation to read a batch of events from the file channel in agent1 by the Avro sink
will be wrapped in a transaction. The transaction will only be committed once the Avro
sink has received the (synchronous) confirmation that the write to the Avro source's RPC
endpoint was successful. This confirmation will only be sent once agent2 's transaction
wrapping the operation to write the batch of events to its file channel has been success-
fully committed. Thus, the Avro sink-source pair guarantees that an event is delivered
from one Flume agent's channel to another Flume agent's channel (at least once).
If either agent is not running, then clearly events cannot be delivered to HDFS. For ex-
ample, if agent1 stops running, then files will accumulate in the spooling directory, to
be processed once agent1 starts up again. Also, any events in an agent's own file chan-
nel at the point the agent stopped running will be available on restart, due to the durability
guarantee that file channel provides.
If agent2 stops running, then events will be stored in agent1 's file channel until
agent2 starts again. Note, however, that channels necessarily have a limited capacity; if
agent1 's channel fills up while agent2 is not running, then any new events will be
lost. By default, a file channel will not recover more than one million events (this can be
overridden by its capacity property), and it will stop accepting events if the free disk
space for its checkpoint directory falls below 500 MB (controlled by the minimumRe-
quiredSpace property).
Both these scenarios assume that the agent will eventually recover, but that is not always
the case (if the hardware it is running on fails, for example). If agent1 doesn't recover,
then the loss is limited to the events in its file channel that had not been delivered to
agent2 before agent1 shut down. In the architecture described here, there are multiple
first-tier agents like agent1 , so other nodes in the tier can take over the function of the
failed node. For example, if the nodes are running load-balanced web servers, then other
nodes will absorb the failed web server's traffic, and they will generate new Flume events
that are delivered to agent2 . Thus, no new events are lost.
Search WWH ::




Custom Search