Database Reference
In-Depth Information
A Simple Example
Before we dive into the details of Spark Streaming, let's consider a simple example.
We will receive a stream of newline-delimited lines of text from a server running at
port 7777, filter only the lines that contain the word
error
, and print them.
Spark Streaming programs are best run as standalone applications built using Maven
or sbt. Spark Streaming, while part of Spark, ships as a separate Maven artifact and
has some additional imports you will want to add to your project. These are shown in
Examples
10-1
through
10-3
.
Example 10-1. Maven coordinates for Spark Streaming
groupId = org.apache.spark
artifactId = spark-streaming_2.10
version = 1.2.0
Example 10-2. Scala streaming imports
import
org.apache.spark.streaming.StreamingContext
import
org.apache.spark.streaming.StreamingContext._
import
org.apache.spark.streaming.dstream.DStream
import
org.apache.spark.streaming.Duration
import
org.apache.spark.streaming.Seconds
Example 10-3. Java streaming imports
import
org.apache.spark.streaming.api.java.JavaStreamingContext
;
import
org.apache.spark.streaming.api.java.JavaDStream
;
import
org.apache.spark.streaming.api.java.JavaPairDStream
;
import
org.apache.spark.streaming.Duration
;
import
org.apache.spark.streaming.Durations
;
We will start by creating a StreamingContext, which is the main entry point for
streaming functionality. This also sets up an underlying SparkContext that it will use
to process the data. It takes as input a
batch interval
specifying how often to process
new data, which we set to 1 second. Next, we use
socketTextStream()
to create a
DStream based on text data received on port 7777 of the local machine. Then we
transform the DStream with
filter()
to get only the lines that contain
error
. Finally,
we apply the output operation
print()
to print some of the filtered lines. (See Exam‐
ples
10-4
and
10-5
.)
Example 10-4. Streaming filter for printing lines containing “error” in Scala
// Create a StreamingContext with a 1-second batch size from a SparkConf
val
ssc
=
new
StreamingContext
(
conf
,
Seconds
(
1
))
// Create a DStream using data received after connecting to port 7777 on the