Building a Clustering Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

14/09/02 21:53:58 INFO KMeans: KMeans reached the max

number of iterations: 10.

14/09/02 21:53:58 INFO KMeans: The cost for the best run is

2586.298785925147

.

...

movieClusterModel:

org.apache.spark.mllib.clustering.KMeansModel =

org.apache.spark.mllib.clustering.KMeansModel@71c6f512

As can be seen from the highlighted text, the model training output tells us that the max-

imum number of iterations was reached, so the training process did not stop early based

on the convergence criterion. It also shows the training set error (that is, the value of the

K-means objective function) for the best run.

We can try a much larger setting for the maximum iterations and use only one training run

to see an example where the K-means model converges:

val movieClusterModelConverged = KMeans.train(movieVectors,

numClusters, 100)

You should be able to see the KMeans converged in ... iterations text in

the model output; this text indicates that after so many iterations, the K-means objective

function did not decrease more than the tolerance level:

...

14/09/02 22:04:38 INFO SparkContext: Job finished:

collectAsMap at KMeans.scala:193, took 0.040685 s

14/09/02 22:04:38 INFO KMeans: Run 0 finished in 34

iterations

14/09/02 22:04:38 INFO KMeans: Iterations took 0.812

seconds.

14/09/02 22:04:38 INFO KMeans: KMeans converged in 34

iterations.

14/09/02 22:04:38 INFO KMeans: The cost for the best run is

2584.9354332904104.

...

movieClusterModelConverged:

Search WWH ::

Custom Search

Home