Database Reference
In-Depth Information
10. For each pair i , the test set, t i , is D i , and the training set, T i , is the union of all
the other D j , j z i (clearly, D = T i t i and T i t i =).
Ten trials were run for each data set and IS algorithm. During the i th trial, the
algorithm is applied to Ti , and then the resulting reduced set is used by the 1-NN
algorithm for classifying the elements of t i , obtaining a test accuracy.
5.6.2.2 Instance Selection - Training Set Selection
We have followed the stratified approach for IS-TSS shown in Figure 5.3 for
carrying out the experiments on the application of the IS algorithms to the TSS. In
particular, for each data set, D , two partitions are randomly made, each consisting
of two nonoverlapping sets with 50% of the elements: D = T 11 T 12 and D =
T 21 T 22 . The IS algorithms are applied to these sets, returning four sets with a
reduced number of instances: S 11 , S 12 , S 21 , and S 22 . Then two different training sets
are calculated:
S 1 = S 11 S 12 and S 2 = S 21 S 22 .
(5.4)
Their associated test sets are s i = D \ S i , i = 1,2. The training sets are used during
the IS process, while the test sets are used to calculate the test accuracy of the
model learned. To determine the quality of the training sets obtained, two learning
algorithms, the classical 1-NN classifier and the C4.5 [31], were used on these sets.
5.6.3 Algorithms and Parameters
5.6.3.1 Instance Selection - Prototype Selection
We have executed the following classical IS algorithms: CNN, ENN, RENN,
MCS, Shrink, and Drop1-3. Moreover, we have carried out experiments with a 1-
NN classifier that considers all instances in the training sets.
The parameters used for EAs are:
z GGA considers a population with 10 chromosomes. The crossover rate is 1,
and two mutation rates were considered: 0.01 for changing 1 to 0, and 0.001
in the contrary case. This asymmetry in mutation rates is considered to favor
the presence in the population of solutions with a few instances, which is a
desirable feature. GGA was run during 1000 generations.
z SGA employs these parameters, as well, but considers 10000 offspring
evaluations.
z The population size of the CHC algorithm was 10 chromosomes, and it was
performed during 1000 generations.
z The parameters associated with PBIL were: N samples = 10, LR = 0.005, P m =
0.01, and Mut_Shif = 0.01; 1000 iterations for this algorithm were completed.
Search WWH ::




Custom Search