Building a Clustering Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Initialization methods

The standard initialization method for K-means, usually simply referred to as the random

method, starts by randomly assigning each data point to a cluster before proceeding with

the first update step.

MLlib provides a parallel variant for this initialization method, called K-means ||, which is

the default initialization method used.

MLlib provides a parallel variant called K-means || , || , for this initialization method; this

is the default initialization method used.

Note

tp://en.wikipedia.org/wiki/K-means%2B%2B for more information.

The results of using K-means++ are shown here. Note that this time, the difficult lower-

right points have been mostly correctly clustered.

Search WWH ::

Custom Search

Home