Database Reference
In-Depth Information
agent1.sources.source1.interceptors = interceptor1
agent1.sources.source1.interceptors.interceptor1.type = timestamp
Using the timestamp interceptor ensures that the timestamps closely reflect the times at
which the events were created. For some applications, using a timestamp for when the
event was written to HDFS might be sufficient — although, be aware that when there are
multiple tiers of Flume agents there can be a significant difference between creation time
and write time, especially in the event of agent downtime (see Distribution: Agent Tiers ).
For these cases, the HDFS sink has a setting, hdfs.useLocalTimeStamp , that will
use a timestamp generated by the Flume agent running the HDFS sink.
File Formats
It's normally a good idea to use a binary format for storing your data in, since the result-
ing files are smaller than they would be if you used text. For the HDFS sink, the file
format used is controlled using hdfs.fileType and a combination of a few other
properties.
If unspecified, hdfs.fileType defaults to SequenceFile , which will write events
to a sequence file with LongWritable keys that contain the event timestamp (or the
current time if the timestamp header is not present) and BytesWritable values that
contain the event body. It's possible to use Text Writable values in the sequence file in-
stead of BytesWritable by setting hdfs.writeFormat to Text .
The configuration is a little different for Avro files. The hdfs.fileType property is set
to DataStream , just like for plain text. Additionally, serializer (note the lack of an
hdfs. prefix) must be set to avro_event . To enable compression, set the serial-
izer.compressionCodec property. Here is an example of an HDFS sink configured
to write Snappy-compressed Avro files:
agent1.sinks.sink1.type = hdfs
agent1.sinks.sink1.hdfs.path = /tmp/flume
agent1.sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .avro
agent1.sinks.sink1.hdfs.fileType = DataStream
agent1.sinks.sink1.serializer = avro_event
agent1.sinks.sink1.serializer.compressionCodec = snappy
An event is represented as an Avro record with two fields: headers , an Avro map with
string values, and body , an Avro bytes field.
If you want to use a custom Avro schema, there are a couple of options. If you have Avro
in-memory objects that you want to send to Flume, then the Log4jAppender is appro-
Search WWH ::




Custom Search