Database Reference
In-Depth Information
Creating a Spark Streaming application
We will now work through creating our first Spark Streaming application to illustrate some
of the basic concepts around Spark Streaming that we introduced earlier.
We will expand on the example applications used in Chapter 1 , Getting Up and Running
with Spark , where we used a small example dataset of product purchase events. For this ex-
ample, instead of using a static set of data, we will create a simple producer application that
will randomly generate events and send them over a network connection. We will then cre-
ate a few Spark Streaming consumer applications that will process this event stream.
The sample project for this chapter contains the code you will need. It is called scala-
spark-streaming-app . It consists of a Scala SBT project definition file, the example
application source code, and a \src\main\resources directory that contains a file
called names.csv .
The build.sbt file for the project contains the following project definition:
name := "scala-spark-streaming-app"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-mllib" %
"1.1.0"
libraryDependencies += "org.apache.spark" %%
"spark-streaming" % "1.1.0"
Note that we added a dependency on Spark MLlib and Spark Streaming, which includes
the dependency on the Spark core.
The names.csv file contains a set of 20 randomly generated user names. We will use
these names as part of our data generation function in our producer application:
Miguel,Eric,James,Juan,Shawn,James,Doug,Gary,Frank,Janet,Michael,James,Malinda,Mike,Elaine,Kevin,Janet,Richard,Saul,Manuela
Search WWH ::




Custom Search