Database Reference
In-Depth Information
The last line adds the dependency on Spark to our project.
Our Scala program is contained in the ScalaApp.scala file. We will walk through the
program piece by piece. First, we need to import the required Spark classes:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
/**
* A simple Spark app in Scala
*/
object ScalaApp {
In our main method, we need to initialize our SparkContext object and use this to ac-
cess our CSV data file with the textFile method. We will then map the raw text by
splitting the string on the delimiter character (a comma in this case) and extracting the rel-
evant records for username, product, and price:
def main(args: Array[String]) {
val sc = new SparkContext("local[2]", "First Spark App")
// we take the raw data in CSV format and convert it
into a set of records of the form (user, product, price)
val data = sc.textFile("data/UserPurchaseHistory.csv")
.map(line => line.split(","))
.map(purchaseRecord => (purchaseRecord(0),
purchaseRecord(1), purchaseRecord(2)))
Now that we have an RDD, where each record is made up of (user, product,
price) , we can compute various interesting metrics for our store, such as the following
ones:
• The total number of purchases
• The number of unique users who purchased
• Our total revenue
• Our most popular product
Let's compute the preceding metrics:
// let's count the number of purchases
val numPurchases = data.count()
Search WWH ::




Custom Search