Database Reference
In-Depth Information
and ZeroMQ. We can include these additional receivers by adding the Maven artifact
spark-streaming-[projectname]_2.10
with the same version number as Spark.
Apache Kafka
Apache Kafka
is popular input source due to its speed and resilience. Using the native
support for Kafka, we can easily process the messages for many topics. To use it, we
have to include the Maven artifact
spark-streaming-kafka_2.10
to our project. The
provided
KafkaUtils
object works on StreamingContext and JavaStreamingContext
to create a DStream of your Kafka messages. Since it can subscribe to multiple topics,
the DStream it creates consists of pairs of topic and message. To create a stream, we
will call the
createStream()
method with our streaming context, a string containing
comma-separated ZooKeeper hosts, the name of our consumer group (a unique
name), and a map of topics to number of receiver threads to use for that topic (see
Examples
10-32
and
10-33
).
Example 10-32. Apache Kafka subscribing to Panda's topic in Scala
import
org.apache.spark.streaming.kafka._
...
// Create a map of topics to number of receiver threads to use
val
topics
=
List
((
"pandas"
,
1
),
(
"logs"
,
1
)).
toMap
val
topicLines
=
KafkaUtils
.
createStream
(
ssc
,
zkQuorum
,
group
,
topics
)
StreamingLogInput
.
processLines
(
topicLines
.
map
(
_
.
_2
))
Example 10-33. Apache Kafka subscribing to Panda's topic in Java
import
org.apache.spark.streaming.kafka.*
;
...
// Create a map of topics to number of receiver threads to use
Map
<
String
,
Integer
>
topics
=
new
HashMap
<
String
,
Integer
>();
topics
.
put
(
"pandas"
,
1
);
topics
.
put
(
"logs"
,
1
);
JavaPairDStream
<
String
,
String
>
input
=
KafkaUtils
.
createStream
(
jssc
,
zkQuorum
,
group
,
topics
);
input
.
print
();
Apache Flume
Spark has two different receivers for use with
Apache Flume
(see
Figure 10-8
). They
are as follows:
Push-based receiver
The receiver acts as an Avro sink that Flume pushes data to.
Pull-based receiver
The receiver can pull data from an intermediate custom sink, to which other pro‐
cesses are pushing data with Flume.