Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

tasks from Stage 165 (FlatMappedRDD[883] at flatMap at

ALS.scala:231)

...

14/03/30 13:15:21 INFO SparkContext: Job finished: count at

<console>:26, took 0.030044 s

res21: Long = 1682

As expected, we have a factor array for each user ( 943 factors) and movie ( 1682

factors).

Training a model using implicit feedback data

The standard matrix factorization approach in MLlib deals with explicit ratings. To work

with implicit data, you can use the trainImplicit method. It is called in a manner

similar to the standard train method. There is an additional parameter, alpha , that can

be set (and in the same way, the regularization parameter, lambda , should be selected via

testing and cross-validation methods).

The alpha parameter controls the baseline level of confidence weighting applied. A

higher level of alpha tends to make the model more confident about the fact that missing

data equates to no preference for the relevant user-item pair.

Note

As an exercise, try to take the existing MovieLens dataset and convert it into an implicit

dataset. One possible approach is to convert it to binary feedback (0s and 1s) by applying

a threshold on the ratings at some level.

Another approach could be to convert the ratings' values into confidence weights (for ex-

ample, perhaps, low ratings could imply zero weights, or even negative weights, which

are supported by MLlib's implementation).

Train a model on this dataset and compare the results of the following section with those

generated by your implicit model.

Search WWH ::

Custom Search

Home