Database Reference
In-Depth Information
Attribute2 against Attribute3 . If the value of Attribute2 is known, there is
still a wide range of possible values for Attribute3 . Thus, greater consideration
must be given prior to dropping one of these attributes from the clustering
analysis.
Another option to reduce the number of attributes is to combine several attributes
into one measure. For example, instead of using two attribute variables, one for
Debt and one for Assets, a Debt to Asset ratio could be used. This option also
addresses the problem when the magnitude of an attribute is not of real interest,
but the relative magnitude is a more important measure.
Units of Measure
From a computational perspective, the k-means algorithm is somewhat indifferent
to the units of measure for a given attribute (for example, meters or centimeters
for a patient's height). However, the algorithm will identify different clusters
depending on the choice of the units of measure. For example, suppose that
k-means is used to cluster patients based on age in years and height in centimeters.
For k=2, Figure 4.11 illustrates the two clusters that would be determined for a
given dataset.
Figure 4.11 Clusters with height expressed in centimeters
But if the height was rescaled from centimeters to meters by dividing by 100, the
resulting clusters would be slightly different, as illustrated in Figure 4.12 .
Search WWH ::




Custom Search