Moving Data - Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset

Database Reference

In-Depth Information

Then you install the component that allows Flume to start up at server boot time:

yum install flume-ng-agent

Finally, you install Flume documentation:

yum install flume-ng-doc

Next, you set up the basic configuration:

[root@hc1nn etc]# cd /etc/flume-ng/conf

[root@hc1nn conf]# ls

flume.conf flume-conf.properties.template flume-env.sh.template log4j.properties

cp flume-conf.properties.template flume.conf

You won't customize the configuration for the agent now; an agent-based configuration file will be defined shortly.

As with many other Cloudera stack Apache components, you can find the Flume configuration and logs in

the standard places. For instance, logs are located under /var/log/flume-ng, the configuration files are under

/etc/flume-ng/conf, and the flume-ng executable file is /usr/bin/flume-ng.

A Simple Agent

As an example of how Flume works, I build a single Flume agent to asynchronously take data from a Centos

Linux-based message file called /var/log/messages. The message file acts as the data source and is stored in a single

Linux-based channel called channel1. The data sink is on HDFS in a directory called “flume/messages.”

In the Linux hadoop account I have created a number of files to run this example of a Flume job, display the

resulting data, and clean up after the job. These files make the job easier to run; there is minimal typing, and it is

easier to rerun the job because the results have been removed from HDFS. The files will also display the results of the

job that reside on HDFS. You can use scripts like these if you desire.

[hadoop@hc1nn ~]$ cd $HOME/flume

[hadoop@hc1nn flume]$ ls

agent1.cfg flume_clean_hdfs.sh flume_exec_hdfs.sh flume_show_hdfs.sh

The file agent1.cfg is the Flume configuration file for the agent, while the Bash (.sh) files are for running the

agent (flume_exec_hdfs.sh), showing the results on HDFS (flume_show_hdfs.sh), and cleaning up the data on HDFS

(flume_clean_hdfs.sh). Examining each of these files in turn, we see that the show script just executes a Hadoop file

system ls command against the directory /flume/messages, where the agent will write the data.

[hadoop@hc1nn flume]$ cat flume_show_hdfs.sh

#!/bin/bash

hdfs dfs -ls /flume/messages

Search WWH ::

Custom Search

Home