Database Reference
In-Depth Information
to class “-” is very low, so a new split in the tree is not considered. But, after the
duplication of the lonely instance, the density of examples belonging to its class
grows, making possible a further split of the tree and the building of different
decision borders.
If the two sources of instability above mentioned were generated at random,
no improvement in the final accuracy might be expected. We wanted to test if
instability generated according to the cases misclasified by other algorithm ( k -
NN) could lead to a improvement over the accuracy yielded by the original ID3.
In the next section are the experimental results we obtained.
5
Experimental Results
Ten databases are used to test our hypothesis. All of them are obtained from the
UCI Machine Learning Repository [2]. These domains are public at the Statlog
project WEB page [18]. The characteristics of the databases are given in Table 1.
As it can be seen, we have chosen different types of databases, selecting some
of them with a large number of predictor variables, or with a large number of
cases and some multi-class problems.
Table 1. Details of databases
Database
Number of Number of Number of
cases
classes attributes
Diabetes
768
2
8
Australian
690
2
14
Heart
270
2
13
Monk2
432
2
6
Wine
178
3
13
Zoo
101
7
16
Waveform-21
5000
3
21
Nettalk
14471
324
203
Letter
20000
26
16
Shuttle
58000
7
9
In order to give a real perspective of applied methods, we use 10-Fold Cross-
validation [29] in all experiments. All databases have been randomly separated
into ten sets of training data and its corresponding test data. Obviously all the
validation files used have been always the same for the two algorithms: ID3
and our approach, k -NN-boosting. Ten executions for every 10-fold set have
been carried out using k -NN-boosting, one for each different K ranging from
1 to 10. In Table 2 a comparative of ID3 error rate, as well as the best and
worst performance of k -NN-boosting, along with the average error rate among
the ten first values of K, used in the experiment, is shown. The cases when k -NN-
boosting outperforms ID3 are drawn in boldface. Let us note that in six out of ten
databases the average of the ten sets of executions of k -NN-boosting outperforms
ID3 and in two of the remaining four cases the performance is similar.
 
Search WWH ::




Custom Search