Getting Up and Running with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

The last line adds the dependency on Spark to our project.

Our Scala program is contained in the ScalaApp.scala file. We will walk through the

program piece by piece. First, we need to import the required Spark classes:

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

/**

* A simple Spark app in Scala

*/

object ScalaApp {

In our main method, we need to initialize our SparkContext object and use this to ac-

cess our CSV data file with the textFile method. We will then map the raw text by

splitting the string on the delimiter character (a comma in this case) and extracting the rel-

evant records for username, product, and price:

def main(args: Array[String]) {

val sc = new SparkContext("local[2]", "First Spark App")

// we take the raw data in CSV format and convert it

into a set of records of the form (user, product, price)

val data = sc.textFile("data/UserPurchaseHistory.csv")

.map(line => line.split(","))

.map(purchaseRecord => (purchaseRecord(0),

purchaseRecord(1), purchaseRecord(2)))

Now that we have an RDD, where each record is made up of (user, product,

price) , we can compute various interesting metrics for our store, such as the following

ones:

• The total number of purchases

• The number of unique users who purchased

• Our total revenue

• Our most popular product

Let's compute the preceding metrics:

// let's count the number of purchases

val numPurchases = data.count()

Search WWH ::

Custom Search

Home