Geology Reference
In-Depth Information
3.4.2 Partition Clustering (K-Means Clustering)
Partition methods are among the most popular techniques in clustering and break
the objects or observations into distinct non-overlapping groups based on some
fixed criteria. There are many partition methods available in the literature based on
criteria such as K-means, K-medians, etc. Some common non-hierarchical methods,
apart from the K-means method, are nucleated agglomerative clustering and clus-
tering using mixture distributions. K-means clustering is the best-known non-
hierarchical clustering technique methods for data grouping, which is also, known
as Forgy
s algorithm. The K-means predictive algorithm was developed by
MacQueen [ 50 ] who was inspired by the work of Steinhaus [ 70 ]. It was later
enhanced by J.A. Hartigan and M.A. Wong during 1975
'
1977. K-means clustering
is particularly appropriate if we specify the number of clusters a priori the clustering
process. Unlike hierarchical CA, the researcher needs to provide the number of
required clusters, then the K-means CA program searches for the best solution with
that number of clusters, whereas the hierarchical method look for n possible clusters
from the observation. This could be viewed as a disadvantage unless prior
knowledge of the correct number of groups exists. For this reason, K-means
clustering is often repeated for a range of initial group numbers and group
assignments.
The algorithm then proceeds in the following way:
-
x k
2. The distance between the data point xi, i , and all the centroids are calculated
3. If xi i is already a member of the group whose mean is closest, repeat step 2 for xi i +1 ,
otherwise reassign xi i to the group whose mean is closest and return to step 1.
This is repeated until all xi i is closest to its group mean. The K-means clustering
algorithm could be described simply as that in Fig. 3.5 .
In case of nucleated agglomerative clustering, the procedures of K-means
clustering and Ward ' s method are combined to make better grouping. In this
method, two numbers of groups must be speci
1. Centroids of each cluster is calculated
ed, an initial number of groups and
a
final number of groups. The initial number of groups is speci
ed to be larger than
the
final number of groups [ 9 ]. The procedure starts with K-means CA, after which
the two closest groups are merged according to Ward
s method (described in
Sect. 3.4.1.5 ). Clustering using mixture distributions is a method where multivariate
normal distributions are
'
Maximization (EM)
algorithm [ 23 , 29 , 53 , 75 ]. Each group is represented by a PDF. The probability of
membership of a data point to a group given its PDF is obtained by using the EM
algorithm. The data points are grouped by assigning each x i to the PDF f g x
fitted to the data via the Expectation
-
ðÞ
which
has the largest probability.
Likewise, as with many clustering approaches, the K-means clustering method
also possesses weaknesses. They are as follows:
 
Search WWH ::




Custom Search