Moving Data - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

The source type is defined as “exec” in line 22, but Flume also supports sources of Avro, Thrift, Syslog, jms,

spooldir, twittersource, seq, http, and Netcat. You also could write custom sources to consume your own data types;

see the Flume user guide at flume.apache.org for more information.

The executable command is specified at line 23 as tail -F /var/log/messages . This command causes new

messages in the file to be received by the agent. Line 24 connects the source to the Flume agent channel, channel1.

Finally, lines 30 through 35 define the HDFS data sink:

30 agent1.sinks.sink1.type = hdfs

31 agent1.sinks.sink1.hdfs.path = hdfs://hc1nn/flume/messages

32 agent1.sinks.sink1.hdfs.rollInterval = 0

33 agent1.sinks.sink1.hdfs.rollSize = 1000000

34 agent1.sinks.sink1.hdfs.batchSize = 100

35 agent1.sinks.sink1.channels = channel1

In this example, the sink type is specified at line 30 to be HDFS, but it could also be a value like logger, avro, irc,

hbase, or a custom sink (see the Flume user guide at flume.apache.org for futher alternatives). Line 31 specifies the

HDFS location as a URI, saving the data to /flume/messages.

Line 32 indicates that the logs will not be rolled by time, owing to the value of 0, while the value at line 33

indicates that the sink will be rolled based on size. Line 34 specifies a sink batch size of 100 for writing to HDFS, and

line 35 connects the channel to the sink.

For this example, I encountered the following error owing to a misconfiguration of the channel name:

2014-07-26 14:45:10,177 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration

$AgentConfiguration.

validateSources(FlumeConfiguration.java:589)] Could not configure source source1 due to: Failed to

configure component!

This error message indicated a configuration error—in this case, it was caused by putting an “s” on the end of the

channels configuration item at line 24. When corrected, the line reads as follows:

24 agent1.sources.source1.channel = channel1

Running the Agent

To run your Flume agent, you simply run your Bash script. In my example, to run the Flume agent agent1, I run the

Centos Linux Bash script flume_exec_hdfs.sh, as follows:

[hadoop@hc1nn flume]$ cd $HOME/flume

[hadoop@hc1nn flume]$ ./flume_execute_hdfs.sh

This writes the voluminous log output to the session window and to the logs under /var/log/flume-ng. For my

example, I don't provide the full output listing here, but I identify the important parts. Flume validates the agent

configuration and so displays the source, channel, and sink as defined:

2014-07-26 17:50:01,377 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$Agent

Configuration.isValid(FlumeConfiguration.java:313)] Starting validation of configuration for agent:

agent1, initial-configuration: AgentConfiguration[agent1]

SOURCES: {source1={ parameters:{command=tail -F /var/log/messages, channels=channel1, type=exec} }}

Search WWH ::

Custom Search

Home