Spark Streaming - Learning Spark

Database Reference

In-Depth Information

and ZeroMQ. We can include these additional receivers by adding the Maven artifact

spark-streaming-[projectname]_2.10 with the same version number as Spark.

Apache Kafka

Apache Kafka is popular input source due to its speed and resilience. Using the native

support for Kafka, we can easily process the messages for many topics. To use it, we

have to include the Maven artifact spark-streaming-kafka_2.10 to our project. The

provided KafkaUtils object works on StreamingContext and JavaStreamingContext

to create a DStream of your Kafka messages. Since it can subscribe to multiple topics,

the DStream it creates consists of pairs of topic and message. To create a stream, we

will call the createStream() method with our streaming context, a string containing

comma-separated ZooKeeper hosts, the name of our consumer group (a unique

name), and a map of topics to number of receiver threads to use for that topic (see

Examples 10-32 and 10-33 ).

Example 10-32. Apache Kafka subscribing to Panda's topic in Scala

import org.apache.spark.streaming.kafka._

...

// Create a map of topics to number of receiver threads to use

val topics = List (( "pandas" , 1 ), ( "logs" , 1 )). toMap

val topicLines = KafkaUtils . createStream ( ssc , zkQuorum , group , topics )

StreamingLogInput . processLines ( topicLines . map ( _ . _2 ))

Example 10-33. Apache Kafka subscribing to Panda's topic in Java

import org.apache.spark.streaming.kafka.* ;

...

// Create a map of topics to number of receiver threads to use

Map < String , Integer > topics = new HashMap < String , Integer >();

topics . put ( "pandas" , 1 );

topics . put ( "logs" , 1 );

JavaPairDStream < String , String > input =

KafkaUtils . createStream ( jssc , zkQuorum , group , topics );

input . print ();

Apache Flume

Spark has two different receivers for use with Apache Flume (see Figure 10-8 ). They

are as follows:

Push-based receiver

The receiver acts as an Avro sink that Flume pushes data to.

Pull-based receiver

The receiver can pull data from an intermediate custom sink, to which other pro‐

cesses are pushing data with Flume.

Search WWH ::

Custom Search

Home