Database Reference
In-Depth Information
An Example
To show how Flume works, let's start with a setup that:
1. Watches a local directory for new text files
2. Sends each line of each file to the console as files are added
We'll add the files by hand, but it's easy to imagine a process like a web server creating
new files that we want to continuously ingest with Flume. Also, in a real system, rather
than just logging the file contents we would write the contents to HDFS for subsequent
processing — we'll see how to do that later in the chapter.
In this example, the Flume agent runs a single source-channel-sink, configured using a Java
properties file. The configuration controls the types of sources, sinks, and channels that are
used, as well as how they are connected together. For this example, we'll use the configura-
tion in
Example 14-1
.
Example 14-1. Flume configuration using a spooling directory source and a logger sink
agent1.sources
=
source1
agent1.sinks
=
sink1
agent1.channels
=
channel1
agent1.sources.source1.channels
=
channel1
agent1.sinks.sink1.channel
=
channel1
agent1.sources.source1.type
=
spooldir
agent1.sources.source1.spoolDir
=
/tmp/spooldir
agent1.sinks.sink1.type
=
logger
agent1.channels.channel1.type
=
file
Property names form a hierarchy with the agent name at the top level. In this example, we
have a single agent, called
agent1
. The names for the different components in an agent
are defined at the next level, so for example
agent1.sources
lists the names of the
sources that should be run in
agent1
(here it is a single source,
source1
). Similarly,
agent1
has a sink (
sink1
) and a channel (
channel1
).
The properties for each component are defined at the next level of the hierarchy. The con-
figuration properties that are available for a component depend on the type of the compon-
ent. In this case,
agent1.sources.source1.type
is set to
spooldir
, which is a
spooling directory source that monitors a spooling directory for new files. The spooling dir-
ectory source defines a
spoolDir
property, so for
source1
the full key is