Building a Clustering Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

The next cluster is more clearly associated with dramas and contains some foreign lan-

guage films in particular.

The last cluster

The final cluster seems to be related predominantly to action and thrillers as well as ro-

mance movies, and seems to contain a number of relatively popular movies.

As you can see, it is not always straightforward to determine exactly what each cluster

represents. However, there is some evidence here that the clustering is picking out attrib-

utes or commonalities between groups of movies, which might not be immediately obvi-

ous based only on the movie titles and genres (such as a foreign language segment, a clas-

sic movie segment, and so on). If we had more metadata available, such as directors, act-

ors, and so on, we might find out more details about the defining features of each cluster.

Tip

We leave it as an exercise for you to perform a similar investigation into the clustering of

the user factors. We have already created the input vectors in the userVectors vari-

able, so you can train a K-means model on these vectors. After that, in order to evaluate

the clusters, you would need to investigate the closest users for each cluster center (as we

did for movies) and see if some common characteristics can be identified from the movies

they have rated or the user metadata available.

Search WWH ::

Custom Search

Home