Real-time Machine Learning with Spark Streaming - Machine Learning with Spark

Database Reference

In-Depth Information

Creating a Spark Streaming application

We will now work through creating our first Spark Streaming application to illustrate some

of the basic concepts around Spark Streaming that we introduced earlier.

We will expand on the example applications used in Chapter 1 , Getting Up and Running

with Spark , where we used a small example dataset of product purchase events. For this ex-

ample, instead of using a static set of data, we will create a simple producer application that

will randomly generate events and send them over a network connection. We will then cre-

ate a few Spark Streaming consumer applications that will process this event stream.

The sample project for this chapter contains the code you will need. It is called scala-

spark-streaming-app . It consists of a Scala SBT project definition file, the example

application source code, and a \src\main\resources directory that contains a file

called names.csv .

The build.sbt file for the project contains the following project definition:

name := "scala-spark-streaming-app"

version := "1.0"

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-mllib" %

"1.1.0"

libraryDependencies += "org.apache.spark" %%

"spark-streaming" % "1.1.0"

Note that we added a dependency on Spark MLlib and Spark Streaming, which includes

the dependency on the Spark core.

The names.csv file contains a set of 20 randomly generated user names. We will use

these names as part of our data generation function in our producer application:

Miguel,Eric,James,Juan,Shawn,James,Doug,Gary,Frank,Janet,Michael,James,Malinda,Mike,Elaine,Kevin,Janet,Richard,Saul,Manuela

Search WWH ::

Custom Search

Home