Database Reference
In-Depth Information
The application of EAs to these two approaches is accomplished by tackling
two important issues: the specification of the representation of the solutions and
the definition of the fitness function.
5.5.1 Representation
Let us assume that T is a data set with m instances. The search space associated
with the IS of T is constituted by all the subsets of T (the subsets are denoted S ).
Then the chromosomes should represent subsets of T . This is accomplished by
using a binary representation. A chromosome consists of m genes (one for each
instance in T ) with two possible states: 0 and 1. If the gene has 1, then its
associated instance is included in the subset S represented by the chromosome. If
it has 0, then this does not occur.
This representation may not be effective for the application of the IS to TSS
problems with large databases because the chromosomes will have too many
genes. In this case, the EAs may have problems driving the search toward the
better regions. For these cases, we propose using a stratified evolutionary model
based on Figure 5.3 (Section 5.2.2), which applies an evolutionary IS algorithm,
independently, on two nonoverlapping partitions of the data set (even, more than
two disjoint partitions may be considered). In this way, the chromosomes will
have fewer genes, and the effectiveness of the EAs may be improved. Furthermore,
this mechanism may be generalized by considering a greater number of partitions,
which will depend on the number of instances of the data set.
5.5.2 Fitness Function
Let S T be a subset of instances to evaluate. We define a fitness function that
combines two values: the classification performance associated with S and the
percentage of reduction of instances of S with regard to T :
Fitness(S) = Į · clas_per + (1-Į) · porc_red .
(5.2)
The 1-NN classifier (Section 5.2.1) is used for measuring the classification rate,
clas_per , associated with S . It denotes the percentage of correctly classified
objects from T using only S to find the nearest neighbor. For each object x in T the
nearest neighbor is searched among those in the set S , without considering the
proper x when x S . On the other hand, porc_red is defined as:
|
T
|
|
S
|
porc
_
red
100
.
(5.3)
|
T
|
Search WWH ::




Custom Search