Biology Reference
In-Depth Information
factors for the 20 dimensions are random numbers uniformly distributed between
1 and 20.
Table 9.1.
Dimensions of the Input Clusters
Input
Dimensions
Points
A
0, 2, 3, 10, 16, 17, 18
16768
B
9, 10, 12, 16, 17, 18, 19
23859
C
1, 3, 9, 12, 13, 14, 16
25678
D
1, 5, 9, 10, 12, 13, 14
23093
E
3, 6, 9, 10, 11, 12, 14
5602
Outlier
5000
Table 9.2 shows a typical result of the dimensions of the output clusters found
by PROCLUS and IPROCLUS. We can find a good correspondence between the
sets of dimensions of the output clusters found by IPROCLUS and their corre-
sponding input clusters.
Table 9.2.
Dimensions of the Output Clusters for IPROCLUS and PROCLUS
IPROCLUS
Dimensions
Points
1
1, 5, 9, 10, 12, 13, 14
23304
2
0, 2, 3, 10, 14, 16, 18
2802
3
9, 10, 12, 16, 17, 18, 19
25034
4
1, 3, 9, 12, 13, 14, 16
25982
5
0, 2, 3, 10, 16, 17, 18
15991
Outliers
6887
PROCLUS
Dimensions
Points
1
0, 2, 3, 4, 9, 12, 16, 17, 18
17660
2
3, 9, 12, 16, 17, 18
24971
3
3, 4, 9, 12, 16, 189
11844
4
1, 3, 9, 12, 13, 14, 16
19130
5
0, 3, 4, 8, 9, 12, 18
8236
Outliers
18159
Table 9.3 gives the confusion matrix for the output clusters in Table 9.2. Con-
fusion matrix is definedinthesamewayasinPROCLUSpaper. Entry( i , j ) is
equal to the number of data points assigned to output cluster i , which were gener-
ated as part of input cluster j . IPROCLUS discovers output clusters in which the
majority of points come from one input cluster. In other words, it recognizes the
natural clustering of the points. More specifically, we calculate the accuracy from
the confusion matrix. In order to define the accuracy, for each output cluster i ,we
identify the input cluster j with which it shares the largest number of points. We
say that output cluster i corresponds to input cluster j . All points in their common
intersection are clustered correctly. All the other points in output cluster i are clus-
Search WWH ::




Custom Search