Database Reference
In-Depth Information
A Simple Example
Before we dive into the details of Spark Streaming, let's consider a simple example.
We will receive a stream of newline-delimited lines of text from a server running at
port 7777, filter only the lines that contain the word error , and print them.
Spark Streaming programs are best run as standalone applications built using Maven
or sbt. Spark Streaming, while part of Spark, ships as a separate Maven artifact and
has some additional imports you will want to add to your project. These are shown in
Examples 10-1 through 10-3 .
Example 10-1. Maven coordinates for Spark Streaming
groupId = org.apache.spark
artifactId = spark-streaming_2.10
version = 1.2.0
Example 10-2. Scala streaming imports
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds
Example 10-3. Java streaming imports
import org.apache.spark.streaming.api.java.JavaStreamingContext ;
import org.apache.spark.streaming.api.java.JavaDStream ;
import org.apache.spark.streaming.api.java.JavaPairDStream ;
import org.apache.spark.streaming.Duration ;
import org.apache.spark.streaming.Durations ;
We will start by creating a StreamingContext, which is the main entry point for
streaming functionality. This also sets up an underlying SparkContext that it will use
to process the data. It takes as input a batch interval specifying how often to process
new data, which we set to 1 second. Next, we use socketTextStream() to create a
DStream based on text data received on port 7777 of the local machine. Then we
transform the DStream with filter() to get only the lines that contain error . Finally,
we apply the output operation print() to print some of the filtered lines. (See Exam‐
ples 10-4 and 10-5 .)
Example 10-4. Streaming filter for printing lines containing “error” in Scala
// Create a StreamingContext with a 1-second batch size from a SparkConf
val ssc = new StreamingContext ( conf , Seconds ( 1 ))
// Create a DStream using data received after connecting to port 7777 on the
Search WWH ::




Custom Search