Database Reference
In-Depth Information
Both approaches require reconfiguring Flume and running the receiver on a node on
a configured port (not your existing Spark or Flume ports). To use either of them, we
have to include the Maven artifact spark-streaming-flume_2.10 in our project.
Figure 10-8. Flume receiver options
Push-based receiver
The push-based approach can be set up quickly but does not use transactions to
receive data. In this approach, the receiver acts as an Avro sink, and we need to con‐
figure Flume to send the data to the Avro sink ( Example 10-34 ). The provided Flume
Utils object sets up the receiver to be started on a specific worker's hostname and
port (Examples 10-35 and 10-36 ). These must match those in our Flume
configuration.
Example 10-34. Flume configuration for Avro sink
a1.sinks = avroSink
a1.sinks.avroSink.type = avro
a1.sinks.avroSink.channel = memoryChannel
a1.sinks.avroSink.hostname = receiver-hostname
a1.sinks.avroSink.port = port-used-for-avro-sink-not-spark-port
Example 10-35. FlumeUtils agent in Scala
val events = FlumeUtils . createStream ( ssc , receiverHostname , receiverPort )
Example 10-36. FlumeUtils agent in Java
JavaDStream < SparkFlumeEvent > events = FlumeUtils . createStream ( ssc , receiverHostname ,
receiverPort )
Despite its simplicity, the disadvantage of this approach is its lack of transactions.
This increases the chance of losing small amounts of data in case of the failure of the
 
Search WWH ::




Custom Search