Evaluating Case Selection Algorithms for Analogical Reasoning Systems - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

According to these criteria, we propose the following optimization model:

Minimize ER

(

x

)

(7)

Minimize

|

x

|

Note that the objectives in the optimization model (7) are contradictory since a lower

number of significant cases means a higher error rate and vice versa, that is the greater

the number of variables the smaller the error rate. The solution to model (7) is a set of

m

x k ,k

≤

X non-dominated solutions C

= {

∈

S

}

, S

= { 1

,...,X

}

, where each

solution x k of C represents the best collection of significant k cases.

From the practical point of view and in order to simplify the model, it is interesting

to sacrifice accuracy slightly when the number of cases are reduced significantly. In

Section 4 some examples are provided.

We propose NSGA-II [6] and SPEA-2 [33] Multiobjective Evolutionary Algorithms

to solve the problem.

4

Experiments and Results

In this section, we present a practical use of the methodology proposed. We evaluate

the case selection methods described in Section 3.3 using case memories of different

domains. In particular we consider standard datasets of the UCI repository 1 . Following

this methodology, we set f

=10

(that is, Cross-Validation with 10 folders) and K

=1

(i.e. 1-NN classifier).

Table 1 depicts a summary of the experiments. For each case memory (rows), the

best results are enhanced in boldface and the worst in italics.

In general, CNN and RNN achieve great size reductions in noise free case memories,

however they keep the error higher than the control methods in every case. In all the

experiments, ENN and All-KNN maintain or improve the error rate. If the case memory

has no noisy instances then the reduction is negligible, otherwise the reduction is clearly

significant. Method as IB2, IB3 and Shrink achieve great size reductions when they

select instances from a case memory with well defined boundaries, but they are too

weak to presence of noisy instance.

The evaluation of methods considering each dataset highlights the suitability of some

methods. For small datasets with high defined boundaries (Iris and Wine), SPEA-2 al-

gorithm seems to be the best approach since they reduce about 50% of the case memory

maintaining an acceptable error rate. For larger datasets and no clear boundaries (Yeast

or Breast Cancer datasets), DROP algorithms also obtain an effective reduction of de

memory (approx. 80%), however ENN and RENN reach a solid reduction but min-

imises the error rate with a less time cost. Note that in the medical domain (e.g. Breast

Cancer dataset) other aspects must be taken into account. In this sense, NSGA-II seems

to be the most effective algorithm since reduces about 50% the case memory maintain-

ing the error rate, but also maintaing the kappa coefficient, specificity and sensitivity.

According to the experiments, ENN and RENN seem also useful for large datasets with

a high number of classes (such as Abalone), improving the accuracy of the system by

reducing 80% the case memory.

1

http://archive.ics.uci.edu/ml/

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home