Database Reference
In-Depth Information
The first step to a Spark program in Scala
We will now use the ideas we introduced in the previous section to write a basic Spark pro-
gram to manipulate a dataset. We will start with Scala and then write the same program in
Java and Python. Our program will be based on exploring some data from an online store,
about which users have purchased which products. The data is contained in a comma-
separated-value ( CSV ) file called UserPurchaseHistory.csv , and the contents are
shown in the following snippet. The first column of the CSV is the username, the second
column is the product name, and the final column is the price:
John,iPhone Cover,9.99
John,Headphones,5.49
Jack,iPhone Cover,9.99
Jill,Samsung Galaxy Cover,8.95
Bob,iPad Cover,5.49
For our Scala program, we need to create two files: our Scala code and our project build
configuration file, using the build tool Scala Build Tool ( sbt ). For ease of use, we recom-
mend that you download the sample project code called scala-spark-app for this
chapter. This code also contains the CSV file under the data directory. You will need SBT
installed on your system in order to run this example program (we use version 0.13.1 at the
time of writing this topic).
Tip
Setting up SBT is beyond the scope of this topic; however, you can find more information
at http://www.scala-sbt.org/release/docs/Getting-Started/Setup.html .
Our SBT configuration file, build.sbt , looks like this (note that the empty lines
between each line of code are required):
name := "scala-spark-app"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" %
"1.2.0 "
Search WWH ::




Custom Search