Java Reference
In-Depth Information
Table 7-11 Customers' (a) original attribute values and (b) normalized attribute values
Customer
Age
Income
Customer
Age
Income
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
70
17
30
65
27
35
45
20
15
25
55,000
30,000
25,000
45,000
60,000
1,30,000
1,20,000
5,000
6,000
45,000
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
0.182
0
0.091
0.545
0.364
0.218
0.909
0.273
0.036
1
0.32
0.008
0
0.92
1
0.44
0.32
0.16
0.2
0.4
Normalize
(a)
(b)
with no clear notion of one value being close to another. To address
numerical attributes, the values can be normalized, as discussed in
Section 3.2, to bring numerical attributes to the same scale. After
normalizing the data in our example using min-max normalization,
the values of age and income are brought to the same scale—values
between 0 and 1. To address categorical attributes for a distance-
based algorithm like k-means, the attributes are exploded, as discussed
in Section 3.2. This converts the categorical attributes into multiple
attributes with numerical values.
Since our example is in two dimensions (attributes), we can easily
graph the clusters, as illustrated in Figure 7-12, and visually identify
the customer clusters. However, clustering problems can involve
hundreds or even thousands of attributes, requiring alternative
analysis and visualization techniques [Keahey 1999].
When two cases are compared, we can use distance or similarity.
Both distance and similarity can be computed by first comparing
pairs of attribute values and then aggregating the results to arrive at
a final comparison measure between the two cases. To this end, JDM
defines commonly used aggregation functions , such as euclidian dis-
tance, and attribute comparison functions , such as absolute difference
or similarity matrix. Using a similarity matrix for categorical
attributes, JDM allows the user to specify explicit similarity values
for categorical attributes using a similarity matrix. For example, if
credit_risk is a categorical attribute with values high, medium , and low ,
Search WWH ::




Custom Search