Database Reference
In-Depth Information
Table 2. Rates of experimental errors of ID3 and k -NN-boosting
Database
ID3 error k -NN-boosting Kvalue k -NN-boosting Kvalue
Average
(best)
(worst)
(over all K)
Diabetes
29.43
29.04
5
32.68
10
31.26
± 0.40
± 1.78
± 0.87
± 1.37
Australian
18.26
17.97
6
19.42
1
18.55
± 1.31
± 0.78
± 1.26
± 0.32
Heart
27.78
21.85
1
27.78
6
25.48
± 0.77
± 0.66
± 3.10
± 3.29
Monk2
53.95
43.74
4
46.75
5
45.09
± 5.58
± 5.30
± 0.73
± 1.03
Wine
7.29
5.03
2
5.59
1
5.04
± 0.53
± 1.69
± 1.87
± 0.06
Zoo
3.91
2.91
4
3.91
1
3.41
± 1.36
± 1.03
± 1.36
± 0.25
Waveform-21
24.84
5
25.26
8
23.02
24.22
± 0.25
± 0.27
± 0.38
± 0.45
Nettalk
25.96
25.81
7
26.09
10
25.95
± 0.27
± 0.50
± 0.44
± 0.01
Letter
11.66
11.47
2
11.86
9
11.66
± 0.20
± 0.25
± 0.21
± 0.02
Shuttle
0.02
0.02
any
0.02
any
0.02
± 0.11
± 0.11
± 0.11
± 0.00
In nine out of ten databases there exists a value of K for which k -NN-boosting
outperforms ID3. In the remaining case the performance is similar. In two out of
ten databases even in the case of the worst K value with respect to accuracy, k -
NN-boosting outperforms ID3, and in other three they behave in a similar way.
In Table 3 the results of applying the Wilcoxon signed rank test [30] to compare
the relative performance of ID3 and k -NN-boosting for the ten databases tested
are shown. It can be seen that in three out of ten databases (Heart, Monk2 and
Waveform-21) there are significance improvements under a confidence level of 95%,
while no significantly worse performance is found in any database for any K value.
Let us observe that in several cases where no significant difference can be
found, the mean value obtained by the new proposed approach outperforms
ID3, as explained above.
In order to give an idea about the increment in the number of instances that
this approach implies, in Table 4 the size of the augmented databases is drawn.
The values appearing in the column labeled K =
corresponds to the size of
the database generated from the entire original database when applying the first
step of k -NN-boosting. As it can be seen, the size increase is not very high, and so
it does not really affect to the computation load of the classification tree model
induction performed by the ID3 algorithm.
K -NN-boosting is a model induction algorithm belonging to the classification
tree family, in which the k -NN paradigm is just used to modify the database the
tree structure is learned from. Due to this characteristic of the algorithm, the
n
 
Search WWH ::




Custom Search