Database Reference
In-Depth Information
and ZeroMQ. We can include these additional receivers by adding the Maven artifact
spark-streaming-[projectname]_2.10 with the same version number as Spark.
Apache Kafka
Apache Kafka is popular input source due to its speed and resilience. Using the native
support for Kafka, we can easily process the messages for many topics. To use it, we
have to include the Maven artifact spark-streaming-kafka_2.10 to our project. The
provided KafkaUtils object works on StreamingContext and JavaStreamingContext
to create a DStream of your Kafka messages. Since it can subscribe to multiple topics,
the DStream it creates consists of pairs of topic and message. To create a stream, we
will call the createStream() method with our streaming context, a string containing
comma-separated ZooKeeper hosts, the name of our consumer group (a unique
name), and a map of topics to number of receiver threads to use for that topic (see
Examples 10-32 and 10-33 ).
Example 10-32. Apache Kafka subscribing to Panda's topic in Scala
import org.apache.spark.streaming.kafka._
...
// Create a map of topics to number of receiver threads to use
val topics = List (( "pandas" , 1 ), ( "logs" , 1 )). toMap
val topicLines = KafkaUtils . createStream ( ssc , zkQuorum , group , topics )
StreamingLogInput . processLines ( topicLines . map ( _ . _2 ))
Example 10-33. Apache Kafka subscribing to Panda's topic in Java
import org.apache.spark.streaming.kafka.* ;
...
// Create a map of topics to number of receiver threads to use
Map < String , Integer > topics = new HashMap < String , Integer >();
topics . put ( "pandas" , 1 );
topics . put ( "logs" , 1 );
JavaPairDStream < String , String > input =
KafkaUtils . createStream ( jssc , zkQuorum , group , topics );
input . print ();
Apache Flume
Spark has two different receivers for use with Apache Flume (see Figure 10-8 ). They
are as follows:
Push-based receiver
The receiver acts as an Avro sink that Flume pushes data to.
Pull-based receiver
The receiver can pull data from an intermediate custom sink, to which other pro‐
cesses are pushing data with Flume.
Search WWH ::




Custom Search