Database Reference
In-Depth Information
Distribution: Agent Tiers
How do we scale a set of Flume agents? If there is one agent running on every node produ-
cing raw data, then with the setup described so far, at any particular time each file being
written to HDFS will consist entirely of the events from one node. It would be better if we
could aggregate the events from a group of nodes in a single file, since this would result in
fewer, larger files (with the concomitant reduction in pressure on HDFS, and more efficient
processing in MapReduce; see Small files and CombineFileInputFormat ) . Also, if needed,
files can be rolled more often since they are being fed by a larger number of nodes, leading
to a reduction between the time when an event is created and when it's available for analys-
is.
Aggregating Flume events is achieved by having tiers of Flume agents. The first tier col-
lects events from the original sources (such as web servers) and sends them to a smaller set
of agents in the second tier, which aggregate events from the first tier before writing them
to HDFS (see Figure 14-3 ). Further tiers may be warranted for very large numbers of
source nodes.
Search WWH ::




Custom Search