Database Reference
In-Depth Information
tasks from Stage 165 (FlatMappedRDD[883] at flatMap at
ALS.scala:231)
...
14/03/30 13:15:21 INFO SparkContext: Job finished: count at
<console>:26, took 0.030044 s
res21: Long = 1682
As expected, we have a factor array for each user ( 943 factors) and movie ( 1682
factors).
Training a model using implicit feedback data
The standard matrix factorization approach in MLlib deals with explicit ratings. To work
with implicit data, you can use the trainImplicit method. It is called in a manner
similar to the standard train method. There is an additional parameter, alpha , that can
be set (and in the same way, the regularization parameter, lambda , should be selected via
testing and cross-validation methods).
The alpha parameter controls the baseline level of confidence weighting applied. A
higher level of alpha tends to make the model more confident about the fact that missing
data equates to no preference for the relevant user-item pair.
Note
As an exercise, try to take the existing MovieLens dataset and convert it into an implicit
dataset. One possible approach is to convert it to binary feedback (0s and 1s) by applying
a threshold on the ratings at some level.
Another approach could be to convert the ratings' values into confidence weights (for ex-
ample, perhaps, low ratings could imply zero weights, or even negative weights, which
are supported by MLlib's implementation).
Train a model on this dataset and compare the results of the following section with those
generated by your implicit model.
Search WWH ::




Custom Search