Database Reference
In-Depth Information
Figure 4.4 Compute the mean of each cluster
4.2.3 Determining the Number of Clusters
With the preceding algorithm, k clusters can be identified in a given dataset, but
what value of k should be selected? The value of k can be chosen based on a
reasonable guess or some predefined requirement. However, even then, it would
be good to know how much better or worse having k clusters versus k - 1 or k +
1 clusters would be in explaining the structure of the data. Next, a heuristic using
the Within Sum of Squares (WSS) metric is examined to determine a reasonably
optimal value of k. Using the distance function given in Equation 4.3 , WSS is
defined as shown in Equation 4.5 .
4.5
In other words, WSS is the sum of the squares of the distances between each data
point and the closest centroid. The term indicates the closest centroid that is
associated with the i th point. If the points are relatively close to their respective
centroids, the WSS is relatively small. Thus, if k + 1 clusters do not greatly reduce
the value of WSS from the case with only k clusters, there may be little benefit to
adding another cluster.
Search WWH ::




Custom Search