Database Reference
In-Depth Information
Creating a Spark Streaming application
We will now work through creating our first Spark Streaming application to illustrate some
of the basic concepts around Spark Streaming that we introduced earlier.
We will expand on the example applications used in
Chapter 1
,
Getting Up and Running
with Spark
, where we used a small example dataset of product purchase events. For this ex-
ample, instead of using a static set of data, we will create a simple producer application that
will randomly generate events and send them over a network connection. We will then cre-
ate a few Spark Streaming consumer applications that will process this event stream.
The sample project for this chapter contains the code you will need. It is called
scala-
spark-streaming-app
. It consists of a Scala SBT project definition file, the example
application source code, and a
\src\main\resources
directory that contains a file
called
names.csv
.
The
build.sbt
file for the project contains the following project definition:
name := "scala-spark-streaming-app"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-mllib" %
"1.1.0"
libraryDependencies += "org.apache.spark" %%
"spark-streaming" % "1.1.0"
Note that we added a dependency on Spark MLlib and Spark Streaming, which includes
the dependency on the Spark core.
The
names.csv
file contains a set of 20 randomly generated user names. We will use
these names as part of our data generation function in our producer application:
Miguel,Eric,James,Juan,Shawn,James,Doug,Gary,Frank,Janet,Michael,James,Malinda,Mike,Elaine,Kevin,Janet,Richard,Saul,Manuela