Instance Selection Using Evolutionary Algorithms: An Experimental Study - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

The objective of the EA is to maximize the fitness function defined, i.e.,

maximize the classification performance and minimize the number of instances

obtained. In the experiments, we have considered Į = 0.5.

5.6 Methodology for the Experiments

In this section, we present the methodology followed for the experiments. Section

5.6.1 describes the data sets used, Section 5.6.2 explains the partitions of the data

sets that were considered for applying the algorithms, and finally, Section 5.6.3

introduces the parameters associated with the algorithms.

5.6.1 Data Sets

A different group of data sets have been contemplated for each problem.

5.6.1.1 Instance Selection - Prototype Selection

We have evaluated 10 classical data sets used in machine learning for the PS [39]

shown in Table 5.1.

Cleveland : This database contains 76 attributes, but all published experiments

refer to using a subset of 13 of them. In particular, the Cleveland database is the

only one that has been used by machine learning researchers to this date. The

“goal” field refers to the presence of heart disease in the patient. It is integer-

valued from 0 (no presence) to 4. Experiments with the Cleveland database have

(forbreaking…)

Table 5.1. Data sets for IS-PS.

Data set Num. instances Num. features Num. classes

Cleveland 297 13 2

Glass 214 9 6

Iris 150 4 3

LED24Digit 200 24 10

LED7Digit 500 7 10

Lymphography 148 18 4

Monk 432 6 2

Pima 768 8 2

Wine 178 13 3

Wisconsin 683 9 2

concentrated on simply attempting to distinguish presence (values 1, 2, 3, 4) from

absence (value 0).

Search WWH ::

Custom Search

Home