Database Reference
In-Depth Information
Initialization methods
The standard initialization method for K-means, usually simply referred to as the random
method, starts by randomly assigning each data point to a cluster before proceeding with
the first update step.
MLlib provides a parallel variant for this initialization method, called K-means ||, which is
the default initialization method used.
MLlib provides a parallel variant called
K-means ||
,
||
, for this initialization method; this
is the default initialization method used.
Note
tp://en.wikipedia.org/wiki/K-means%2B%2B
for more information.
The results of using K-means++ are shown here. Note that this time, the difficult lower-
right points have been mostly correctly clustered.