Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques - page 453

Databases Reference

In-Depth Information

+

+

+

+

+

+

+

+

+

(a) Initial clustering

(b) Iterate

(c) Final clustering

Figure10.3 Clustering of a set of objects using the k -means method; for (b) update cluster centers and

reassign objects accordingly (the mean of each cluster is marked by a C).

Example10.1 Clustering by k -means partitioning. Consider a set of objects located in 2-D space,

as depicted in Figure 10.3(a). Let k D 3, that is, the user would like the objects to be

partitioned into three clusters.

According to the algorithm in Figure 10.2, we arbitrarily choose three objects as

the three initial cluster centers, where cluster centers are marked by a C. Each object

is assigned to a cluster based on the cluster center to which it is the nearest. Such a

distribution forms silhouettes encircled by dotted curves, as shown in Figure 10.3(a).

Next, the cluster centers are updated. That is, the mean value of each cluster is recal-

culated based on the current objects in the cluster. Using the new cluster centers, the

objects are redistributed to the clusters based on which cluster center is the nearest.

Such a redistribution forms new silhouettes encircled by dashed curves, as shown in

Figure 10.3(b).

This process iterates, leading to Figure 10.3(c). The process of iteratively reassigning

objects to clusters to improve the partitioning is referred to as iterative relocation . Even-

tually, no reassignment of the objects in any cluster occurs and so the process terminates.

The resulting clusters are returned by the clustering process.

The k -means method is not guaranteed to converge to the global optimum and often

terminates at a local optimum. The results may depend on the initial random selection

of cluster centers. (You will be asked to give an example to show this as an exercise.)

To obtain good results in practice, it is common to run the k -means algorithm multiple

times with different initial cluster centers.

The time complexity of the k -means algorithm is O

, where n is the total number

of objects, k is the number of clusters, and t is the number of iterations. Normally, k n

and t n . Therefore, the method is relatively scalable and efficient in processing large

data sets.

There are several variants of the k -means method. These can differ in the selection

of the initial k -means, the calculation of dissimilarity, and the strategies for calculating

cluster means.

.

nkt

/

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home