Spark Streaming - Learning Spark

Database Reference

In-Depth Information

Both approaches require reconfiguring Flume and running the receiver on a node on

a configured port (not your existing Spark or Flume ports). To use either of them, we

have to include the Maven artifact spark-streaming-flume_2.10 in our project.

Figure 10-8. Flume receiver options

Push-based receiver

The push-based approach can be set up quickly but does not use transactions to

receive data. In this approach, the receiver acts as an Avro sink, and we need to con‐

figure Flume to send the data to the Avro sink ( Example 10-34 ). The provided Flume

Utils object sets up the receiver to be started on a specific worker's hostname and

port (Examples 10-35 and 10-36 ). These must match those in our Flume

configuration.

Example 10-34. Flume configuration for Avro sink

a1.sinks = avroSink

a1.sinks.avroSink.type = avro

a1.sinks.avroSink.channel = memoryChannel

a1.sinks.avroSink.hostname = receiver-hostname

a1.sinks.avroSink.port = port-used-for-avro-sink-not-spark-port

Example 10-35. FlumeUtils agent in Scala

val events = FlumeUtils . createStream ( ssc , receiverHostname , receiverPort )

Example 10-36. FlumeUtils agent in Java

JavaDStream < SparkFlumeEvent > events = FlumeUtils . createStream ( ssc , receiverHostname ,

receiverPort )

Despite its simplicity, the disadvantage of this approach is its lack of transactions.

This increases the chance of losing small amounts of data in case of the failure of the

Search WWH ::

Custom Search

Home