Flume - Hadoop: The Definitive Guide

Database Reference

In-Depth Information

agent1.sources = source1

agent1.sinks = sink1a sink1b

agent1.sinkgroups = sinkgroup1

agent1.channels = channel1

agent1.sources.source1.channels = channel1

agent1.sinks.sink1a.channel = channel1

agent1.sinks.sink1b.channel = channel1

agent1.sinkgroups.sinkgroup1.sinks = sink1a sink1b

agent1.sinkgroups.sinkgroup1.processor.type = load_balance

agent1.sinkgroups.sinkgroup1.processor.backoff = true

agent1.sources.source1.type = spooldir

agent1.sources.source1.spoolDir = /tmp/spooldir

agent1.sinks.sink1a.type = avro

agent1.sinks.sink1a.hostname = localhost

agent1.sinks.sink1a.port = 10000

agent1.sinks.sink1b.type = avro

agent1.sinks.sink1b.hostname = localhost

agent1.sinks.sink1b.port = 10001

agent1.channels.channel1.type = file

There are two Avro sinks defined, sink1a and sink1b , which differ only in the Avro

endpoint they are connected to (since we are running all the examples on localhost, it is

the port that is different; for a distributed install, the hosts would differ and the ports

would be the same). We also define sinkgroup1 , and set its sinks to sink1a and

sink1b .

The processor type is set to load_balance , which attempts to distribute the event flow

over both sinks in the group, using a round-robin selection mechanism (you can change

this using the processor.selector property). If a sink is unavailable, then the next

sink is tried; if they are all unavailable, the event is not removed from the channel, just

like in the single sink case. By default, sink unavailability is not remembered by the sink

processor, so failing sinks are retried for every batch of events being delivered. This can

be inefficient, so we have set the processor.backoff property to change the behavi-

or so that failing sinks are blacklisted for an exponentially increasing timeout period (up

to a maximum period of 30 seconds, controlled by pro-

cessor.selector.maxTimeOut ).

Search WWH ::

Custom Search

Home