Database Reference
In-Depth Information
Then you install the component that allows Flume to start up at server boot time:
yum install flume-ng-agent
Finally, you install Flume documentation:
yum install flume-ng-doc
Next, you set up the basic configuration:
[root@hc1nn etc]# cd /etc/flume-ng/conf
[root@hc1nn conf]# ls
flume.conf flume-conf.properties.template flume-env.sh.template log4j.properties
cp flume-conf.properties.template flume.conf
You won't customize the configuration for the agent now; an agent-based configuration file will be defined shortly.
As with many other Cloudera stack Apache components, you can find the Flume configuration and logs in
the standard places. For instance, logs are located under /var/log/flume-ng, the configuration files are under
/etc/flume-ng/conf, and the flume-ng executable file is /usr/bin/flume-ng.
A Simple Agent
As an example of how Flume works, I build a single Flume agent to asynchronously take data from a Centos
Linux-based message file called /var/log/messages. The message file acts as the data source and is stored in a single
Linux-based channel called channel1. The data sink is on HDFS in a directory called “flume/messages.”
In the Linux hadoop account I have created a number of files to run this example of a Flume job, display the
resulting data, and clean up after the job. These files make the job easier to run; there is minimal typing, and it is
easier to rerun the job because the results have been removed from HDFS. The files will also display the results of the
job that reside on HDFS. You can use scripts like these if you desire.
[hadoop@hc1nn ~]$ cd $HOME/flume
[hadoop@hc1nn flume]$ ls
agent1.cfg flume_clean_hdfs.sh flume_exec_hdfs.sh flume_show_hdfs.sh
The file agent1.cfg is the Flume configuration file for the agent, while the Bash (.sh) files are for running the
agent (flume_exec_hdfs.sh), showing the results on HDFS (flume_show_hdfs.sh), and cleaning up the data on HDFS
(flume_clean_hdfs.sh). Examining each of these files in turn, we see that the show script just executes a Hadoop file
system ls command against the directory /flume/messages, where the agent will write the data.
[hadoop@hc1nn flume]$ cat flume_show_hdfs.sh
#!/bin/bash
hdfs dfs -ls /flume/messages
 
Search WWH ::




Custom Search