Database Reference
In-Depth Information
The source type is defined as “exec” in line 22, but Flume also supports sources of Avro, Thrift, Syslog, jms,
spooldir, twittersource, seq, http, and Netcat. You also could write custom sources to consume your own data types;
see the Flume user guide at flume.apache.org for more information.
The executable command is specified at line 23 as tail -F /var/log/messages . This command causes new
messages in the file to be received by the agent. Line 24 connects the source to the Flume agent channel, channel1.
Finally, lines 30 through 35 define the HDFS data sink:
30 agent1.sinks.sink1.type = hdfs
31 agent1.sinks.sink1.hdfs.path = hdfs://hc1nn/flume/messages
32 agent1.sinks.sink1.hdfs.rollInterval = 0
33 agent1.sinks.sink1.hdfs.rollSize = 1000000
34 agent1.sinks.sink1.hdfs.batchSize = 100
35 agent1.sinks.sink1.channels = channel1
In this example, the sink type is specified at line 30 to be HDFS, but it could also be a value like logger, avro, irc,
hbase, or a custom sink (see the Flume user guide at flume.apache.org for futher alternatives). Line 31 specifies the
HDFS location as a URI, saving the data to /flume/messages.
Line 32 indicates that the logs will not be rolled by time, owing to the value of 0, while the value at line 33
indicates that the sink will be rolled based on size. Line 34 specifies a sink batch size of 100 for writing to HDFS, and
line 35 connects the channel to the sink.
For this example, I encountered the following error owing to a misconfiguration of the channel name:
2014-07-26 14:45:10,177 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration
$AgentConfiguration.
validateSources(FlumeConfiguration.java:589)] Could not configure source source1 due to: Failed to
configure component!
This error message indicated a configuration error—in this case, it was caused by putting an “s” on the end of the
channels configuration item at line 24. When corrected, the line reads as follows:
24 agent1.sources.source1.channel = channel1
Running the Agent
To run your Flume agent, you simply run your Bash script. In my example, to run the Flume agent agent1, I run the
Centos Linux Bash script flume_exec_hdfs.sh, as follows:
[hadoop@hc1nn flume]$ cd $HOME/flume
[hadoop@hc1nn flume]$ ./flume_execute_hdfs.sh
This writes the voluminous log output to the session window and to the logs under /var/log/flume-ng. For my
example, I don't provide the full output listing here, but I identify the important parts. Flume validates the agent
configuration and so displays the source, channel, and sink as defined:
2014-07-26 17:50:01,377 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$Agent
Configuration.isValid(FlumeConfiguration.java:313)] Starting validation of configuration for agent:
agent1, initial-configuration: AgentConfiguration[agent1]
SOURCES: {source1={ parameters:{command=tail -F /var/log/messages, channels=channel1, type=exec} }}
 
Search WWH ::




Custom Search