Database Reference
In-Depth Information
z By selecting features [6], [24], we reduce the number of columns in a data set.
z By discretizing feature-values [14], we reduce the number of possible values
of discretized features.
z By selecting instances [6], [26], we reduce the number of rows in a data set.
Instance selection (IS) is a focusing task in the data-preparation phase [8] of
KD. IS may comprise following different strategies: sampling , boosting ,
prototype selection (PS), and active learning .
The topic of this chapter is precisely the IS [6], [27], [33], [40] by means of
evolutionary algorithms (EAs) for data reduction in KD.
EAs [2], [3], [13] are general-purpose search algorithms that use principles
inspired by natural genetic populations to evolve solutions for problems. The basic
idea is to maintain a population of chromosomes, which represent candidate
solutions for the specific problem, which evolves over time through a process of
competition and controlled variation. EAs have been designed to solve the IS
problem, given promising results [4], [19], [21], [28], [32], [41].
The goal of this chapter is to present the application of some representative EA
models for data reduction and to compare them with nonevolutionary IS
algorithms (classical ones in the following). To do this, we carry out our study
from two different points of view:
z IS-PS : Analyzing the results when they are used for prototype selection for
classification, where 1-NN is applied to evaluate the classification percentage
offered by the training set selected. We are going to denote IS-PS ( Instance
Selection - Prototype Selection ) to this point of view.
z IS-TSS : Analyzing the behavior of EAs as instance selectors for data
reduction, applying C4.5 [31] to evaluate the training set selected. We are
going to denote IS-TSS ( Instance Selection - Training Set Selection ) to this
approach.
The second one is, really, the most important aspect and novelty in this chapter,
to analyze the behavior of EAs for data reduction in KD. In particular, we
introduce a stratified EA model for evaluating this approach.
The chapter is organized as follows. In Section 5.2, we explain the main ideas
of IS, introduce a brief historical review, and describe two processes where IS
algorithms take part, the IS-PS and the IS-TSS. In Section 5.3, we survey the main
classical IS algorithms. In Section 5.4, we introduce the foundations of the EAs
and summarize the basic features of the models considered in this chapter. In
Section 5.5, we provide details on the way EAs may be applied to the IS problem.
In Section 5.6, we deal with the methodology for the experiments. In Section 5.7,
we include the results of the experiments and their analysis. Finally, in Section 5.8,
we present some concluding remarks.
Search WWH ::




Custom Search