Database Reference
In-Depth Information
Figure 4.12 Clusters with height expressed in meters
When the height is expressed in meters, the magnitude of the ages dominates
the distance calculation between two points. The height attribute provides only
as much as the square between the difference of the maximum height and the
minimum height or to the radicand, the number under the square
root symbol in the distance formula given in Equation 4.3 . Age can contribute as
much as
to the radicand when measuring the distance.
Rescaling
Attributes that are expressed in dollars are common in clustering analyses and
can differ in magnitude from the other attributes. For example, if personal income
is expressed in dollars and age is expressed in years, the income attribute, often
exceeding $10,000, can easily dominate the distance calculation with ages typically
less than 100 years.
Although some adjustments could be made by expressing the income in thousands
of dollars (for example, 10 for $10,000), a more straightforward method is to
divide each attribute by the attribute's standard deviation. The resulting attributes
will each have a standard deviation equal to 1 and will be without units. Returning
to the age and height example, the standard deviations are 23.1 years and 36.4 cm,
respectively. Dividing each attribute value by the appropriate standard deviation
and performing the k-means analysis yields the result shown in Figure 4.13 .
Search WWH ::




Custom Search