Database Reference
In-Depth Information
CHANNELS: {channel1={ parameters:{checkpointInterval=60000, capacity=2000000,
maxFileSize=10737418240, type=FILE} }}
SINKS: {sink1={ parameters:{hdfs.path=hdfs://hc1nn/flume/messages, hdfs.batchSize=100,
hdfs.rollInterval=0, hdfs.rollSize=1000000, type=hdfs, channel=channel1} }}
Flume then sets up the file-based channel:
2014-07-26 17:50:02,858 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.channel.file.FileChannel.
start(FileChannel.java:254)] Starting FileChannel channel1 { dataDirs: [/home/hadoop/.flume/file-
channel/data] }...
The channel on the Linux file system contains checkpoint and log data:
[hadoop@hc1nn flume]$ ls -l $HOME/.flume/file-channel/*
/home/hadoop/.flume/file-channel/checkpoint:
total 15652
-rw-rw-r--. 1 hadoop hadoop 16008232 Jul 26 17:51 checkpoint
-rw-rw-r--. 1 hadoop hadoop 25 Jul 26 17:51 checkpoint.meta
-rw-rw-r--. 1 hadoop hadoop 32 Jul 26 17:51 inflightputs
-rw-rw-r--. 1 hadoop hadoop 32 Jul 26 17:51 inflighttakes
drwxrwxr-x. 2 hadoop hadoop 4096 Jul 26 17:50 queueset
/home/hadoop/.flume/file-channel/data:
total 2060
-rw-rw-r--. 1 hadoop hadoop 0 Jul 26 15:44 log-6
-rw-rw-r--. 1 hadoop hadoop 47 Jul 26 15:44 log-6.meta
-rw-rw-r--. 1 hadoop hadoop 1048576 Jul 26 15:55 log-7
-rw-rw-r--. 1 hadoop hadoop 47 Jul 26 15:56 log-7.meta
-rw-rw-r--. 1 hadoop hadoop 1048576 Jul 26 17:50 log-8
-rw-rw-r--. 1 hadoop hadoop 47 Jul 26 17:51 log-8.meta
The Flume agent sets up the data sink by creating a single empty file on HDFS. The log message indicating this is
as follows:
2014-07-26 17:50:10,532 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.
flume.sink.hdfs.BucketWriter.open(BucketWriter.java:220)] Creating hdfs://hc1nn/flume/messages/
FlumeData.1406353810397.tmp
The script flume_show_hdfs.sh can be run as follows, using the Linux hadoop account to show the Flume data
sink file on HDFS:
[hadoop@hc1nn flume]$ ./flume_show_hdfs.sh
Found 1 items
-rw-r--r-- 2 hadoop hadoop 0 2014-07-26 17:50 /flume/messages/FlumeData.1406353810397.tmp
The script reveals that the file is empty, with a zero in the fifth column. When the number of new messages in
the messages file reaches 100 (as defined by batchSize at line 34 of the agent configuration file), the data is written to
HDFS from the channel:
34 agent1.sinks.sink1.hdfs.batchSize = 100
Search WWH ::




Custom Search