K Nearest Neighbor Edition to Guide Classification Tree Learning: Motivation and Experimental Results - Data Mining: Theory, Methodology, Techniques, and Applications - page 60

Database Reference

In-Depth Information

Table 2. Rates of experimental errors of ID3 and k -NN-boosting

Database

ID3 error k -NN-boosting Kvalue k -NN-boosting Kvalue

Average

(best)

(worst)

(over all K)

Diabetes

29.43

29.04

5

32.68

10

31.26

± 0.40

± 1.78

± 0.87

± 1.37

Australian

18.26

17.97

6

19.42

1

18.55

± 1.31

± 0.78

± 1.26

± 0.32

Heart

27.78

21.85

1

27.78

6

25.48

± 0.77

± 0.66

± 3.10

± 3.29

Monk2

53.95

43.74

4

46.75

5

45.09

± 5.58

± 5.30

± 0.73

± 1.03

Wine

7.29

5.03

2

5.59

1

5.04

± 0.53

± 1.69

± 1.87

± 0.06

Zoo

3.91

2.91

4

3.91

1

3.41

± 1.36

± 1.03

± 1.36

± 0.25

Waveform-21

24.84

5

25.26

8

23.02

24.22

± 0.25

± 0.27

± 0.38

± 0.45

Nettalk

25.96

25.81

7

26.09

10

25.95

± 0.27

± 0.50

± 0.44

± 0.01

Letter

11.66

11.47

2

11.86

9

11.66

± 0.20

± 0.25

± 0.21

± 0.02

Shuttle

0.02

0.02

any

0.02

any

0.02

± 0.11

± 0.11

± 0.11

± 0.00

In nine out of ten databases there exists a value of K for which k -NN-boosting

outperforms ID3. In the remaining case the performance is similar. In two out of

ten databases even in the case of the worst K value with respect to accuracy, k -

NN-boosting outperforms ID3, and in other three they behave in a similar way.

In Table 3 the results of applying the Wilcoxon signed rank test [30] to compare

the relative performance of ID3 and k -NN-boosting for the ten databases tested

are shown. It can be seen that in three out of ten databases (Heart, Monk2 and

Waveform-21) there are significance improvements under a confidence level of 95%,

while no significantly worse performance is found in any database for any K value.

Let us observe that in several cases where no significant difference can be

found, the mean value obtained by the new proposed approach outperforms

ID3, as explained above.

In order to give an idea about the increment in the number of instances that

this approach implies, in Table 4 the size of the augmented databases is drawn.

The values appearing in the column labeled K =

corresponds to the size of

the database generated from the entire original database when applying the first

step of k -NN-boosting. As it can be seen, the size increase is not very high, and so

it does not really affect to the computation load of the classification tree model

induction performed by the ID3 algorithm.

K -NN-boosting is a model induction algorithm belonging to the classification

tree family, in which the k -NN paradigm is just used to modify the database the

tree structure is learned from. Due to this characteristic of the algorithm, the

n

Next Page

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home