Instance Selection Using Evolutionary Algorithms: An Experimental Study - Advanced Techniques in Knowledge Discovery and Data Mining

Database Reference

In-Depth Information

5. Instance Selection Using Evolutionary

Algorithms: An Experimental Study

José Ramón Cano, 1 Francisco Herrera, 2 and Manuel Lozano 2

1

Dept. of Computer Science, Escuela Politecnica Superior de Linares,

University of Jaén, 23700 Jaén, Spain; email: jrcano@decsai.ugr.es

2

Dept. of Computer Science and Artificial Intelligence, Escuela Tecnica

Superior de Ingenieria Informatica, University of Granada, 18071 Granada,

Spain; email: herrera, lozano@decsai.ugr.es

In this chapter, we carry out an empirical study of the performance of four

representative evolutionary algorithm models considering two instance-selection

perspectives, the prototype selection and the training set selection for data

reduction in knowledge discovery. This study includes a comparison between

these algorithms and other nonevolutionary instance-selection algorithms. The

results show that the evolutionary instance-selection algorithms consistently

outperform the nonevolutionary ones, offering two main advantages

simultaneously, better instance-reduction rates and higher classification accuracy.

5.1 Introduction

The digital technologies and computer advances with booming Internet use have

led to massive data collection and information. Research in areas of science from

astronomy to the human natural genome is facing the same problem choking on

information. Raw data are rarely of direct use, and manual analysis simply cannot

keep pace with the fast growth of data. Knowledge discovery (KD) [34] and data

mining (DM) [1] help us; they aim to turn raw data into nuggets and create special

edges.

KD processes include problem comprehension , data comprehension , data

preprocessing , DM , evaluation, and development [1], [8], [35]. The first three

processes (problem and data comprehension and data preprocessing) play a pivotal

role in successful DM.

Due to the enormous amounts of data, much of the current research is based on

scaling up DM algorithms. Research has also worked on scaling down data. The

major issue of scaling down data is to select the relevant data and then present

them to a DM algorithm [25]. This task is developed in the data-preprocessing

phase in the KD process.

Data preprocessing presents the following strategies: data reduction , data

cleaning , data construction , data integration , and data format change . Our

attention is focused on data reduction . Data reduction can be achieved in many

ways:

Search WWH ::

Custom Search

Home