Instance Selection Using Evolutionary Algorithms: An Experimental Study - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

z By selecting features [6], [24], we reduce the number of columns in a data set.

z By discretizing feature-values [14], we reduce the number of possible values

of discretized features.

z By selecting instances [6], [26], we reduce the number of rows in a data set.

Instance selection (IS) is a focusing task in the data-preparation phase [8] of

KD. IS may comprise following different strategies: sampling , boosting ,

prototype selection (PS), and active learning .

The topic of this chapter is precisely the IS [6], [27], [33], [40] by means of

evolutionary algorithms (EAs) for data reduction in KD.

EAs [2], [3], [13] are general-purpose search algorithms that use principles

inspired by natural genetic populations to evolve solutions for problems. The basic

idea is to maintain a population of chromosomes, which represent candidate

solutions for the specific problem, which evolves over time through a process of

competition and controlled variation. EAs have been designed to solve the IS

problem, given promising results [4], [19], [21], [28], [32], [41].

The goal of this chapter is to present the application of some representative EA

models for data reduction and to compare them with nonevolutionary IS

algorithms (classical ones in the following). To do this, we carry out our study

from two different points of view:

z IS-PS : Analyzing the results when they are used for prototype selection for

classification, where 1-NN is applied to evaluate the classification percentage

offered by the training set selected. We are going to denote IS-PS ( Instance

Selection - Prototype Selection ) to this point of view.

z IS-TSS : Analyzing the behavior of EAs as instance selectors for data

reduction, applying C4.5 [31] to evaluate the training set selected. We are

going to denote IS-TSS ( Instance Selection - Training Set Selection ) to this

approach.

The second one is, really, the most important aspect and novelty in this chapter,

to analyze the behavior of EAs for data reduction in KD. In particular, we

introduce a stratified EA model for evaluating this approach.

The chapter is organized as follows. In Section 5.2, we explain the main ideas

of IS, introduce a brief historical review, and describe two processes where IS

algorithms take part, the IS-PS and the IS-TSS. In Section 5.3, we survey the main

classical IS algorithms. In Section 5.4, we introduce the foundations of the EAs

and summarize the basic features of the models considered in this chapter. In

Section 5.5, we provide details on the way EAs may be applied to the IS problem.

In Section 5.6, we deal with the methodology for the experiments. In Section 5.7,

we include the results of the experiments and their analysis. Finally, in Section 5.8,

we present some concluding remarks.

Search WWH ::

Custom Search

Home