Database Reference
In-Depth Information
4.2 K-means
Given a collection of objects each with n measurable attributes, k-means [1] is an
analytical technique that, for a chosen value of k, identifies k clusters of objects
based on the objects' proximity to the center of the k groups. The center is
determined as the arithmetic average (mean) of each cluster's n-dimensional vector
of attributes. This section describes the algorithm to determine the k means as well
as how best to apply this technique to several use cases. Figure 4.1 illustrates three
clusters of objects with two attributes. Each object in the dataset is represented by a
small dot color-coded to the closest large dot, the mean of the cluster.
Figure 4.1 Possible k-means clusters for k=3
4.2.1 Use Cases
Clustering is often used as a lead-in to classification. Once the clusters are identified,
labels can be applied to each cluster to classify each group based on its
characteristics. Classification is covered in more detail in Chapter 7, “Advanced
Analytical Theory and Methods: Classification.” Clustering is primarily an
exploratory technique to discover hidden structures of the data, possibly as a
prelude to more focused analysis or decision processes. Some specific applications
of k-means are image processing, medical, and customer segmentation.
Search WWH ::




Custom Search