Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Extracting features from the MovieLens 100k

dataset

Start the Spark shell in the Spark base directory, ensuring that you provide enough memory

via the -driver-memory option:

>./bin/spark-shell -driver-memory 4g

In this example, we will use the same MovieLens dataset that we used in the previous

chapter. Use the directory in which you placed the MovieLens 100k dataset as the input

path in the following code.

First, let's inspect the raw ratings dataset:

val rawData = sc.textFile("/ PATH /ml-100k/u.data")

rawData.first()

You will see output similar to these lines of code:

14/03/30 11:42:41 WARN NativeCodeLoader: Unable to load

native-hadoop library for your platform... using

builtin-java classes where applicable

14/03/30 11:42:41 WARN LoadSnappy: Snappy native library not

loaded

14/03/30 11:42:41 INFO FileInputFormat: Total input paths to

process : 1

14/03/30 11:42:41 INFO SparkContext: Starting job: first at

<console>:15

14/03/30 11:42:41 INFO DAGScheduler: Got job 0 (first at

<console>:15) with 1 output partitions (allowLocal=true)

14/03/30 11:42:41 INFO DAGScheduler: Final stage: Stage 0

(first at <console>:15)

14/03/30 11:42:41 INFO DAGScheduler: Parents of final stage:

List()

14/03/30 11:42:41 INFO DAGScheduler: Missing parents: List()

14/03/30 11:42:41 INFO DAGScheduler: Computing the requested

partition locally

14/03/30 11:42:41 INFO HadoopRDD: Input split: file:/Users/

Search WWH ::

Custom Search

Home