Information Technology Reference
In-Depth Information
e.g. using OLDA,SVM,ADB . In subsequent experimentswewill denote these variants
respectively with the acronyms OLDA-FR, SVM-FR, ADBFE-FR . These algorithms,
along with BVQ-FR, will be tested on the datasets listed above, following the exper-
imental procedure described in the previous section. Notice in the parameter setting
for BVQ-FR, the number of code vectors has been set to a multiple of the number
of classes in the dataset, with 200000 BVQ iterations, whereas
and r come from
a manual refinement in three steps. In SVM-FR, we employ a Gaussian radial basis
kernel to train the SVM, and we set r to 0.2.
In the Fig. 4.4 , the accuracy curves are grouped by dataset to compare the perfor-
mance of EDBFMRanking algorithms. For each dataset the accuracy curve obtained
by means of random permutation of features is also displayed. Notice the curves of
EDBFM ranking are always located above the random ranking curve, that reveals the
general efficacy of EDBFM ranking. The qualitative comparison between the curves
is difficult because of the irregular pattern and their overlaps. The Performance Index
ˆ
ʔ
calculated for each curve is shown. Note
that missing values in the Table 4.3 are due to the impossibility to perform compu-
tationally expensive algorithms, such as SVM and ADBFE, on datasets with large
number of classes and instances. We can observe in Table 4.3 , where rows are sorted
by increasing complexity of the dataset, OLDA-FR and BVQ-FR have, together,
a dominance in the values of
is of help in the analysis. In Table 4.3
ˆ
when applied to datasets with two classes Heart-
Stat, Heart, Australian, Urban, Wildfire, Landslide whereas BVQ-FR has a relative
dominance on complex datasets Ionosphere, Waveform, CoverType, Segment, Got-
tigen, Letter, Corine . This is due to the fact that BVQ-FR, based on nonparametric
model, has a superior performance when working on non-linearly separable classes
of objects.
In the second set of experiments we compare the performance of OLDA-FR and
BVQ-FR with other ranking methods known in literature such as Relief, Gain Ratio
and One-Rule . Also heuristic methods calculate a weight for each individual real
feature, which allows to rearrange the features by decreasing weights and to submit
dataset to the 1NNclassification algorithmusing the same procedure as for themodels
based on EDBFM. Accuracies calculated in the previous experiment for OLDA-FR
and BVQ-FR are now compared with accuracies obtained using the Relief, Gain
Ratio and One Rule. The accuracy curves gathered by dataset are shown in Fig. 4.5 .
The criterion of comparison of curves is the same than in the previous experiment.
The general picture of performances is rather complex, but trends are evidenced
by the analysis of the index
ˆ
. For each dataset the best ranker is highlighted in
the Table 4.4 ; there are also reported some statistics of the Performance Index: the
mean value of
ˆ
for BVQ-FR is the highest, and the variance has the lowest value.
The statistics indicate a low dispersion of
ˆ
for BVQ-FR algorithm, that reveals a
relatively stable behaviour in comparison to Relief, GainRatio and One-Rule rankers
and OLDA-FR as well.
In the star plot (see Fig. 4.6 ) the index values are shown as a radial line from a
common centre point. Points corresponding to the same algorithm are connected by
a common-style line. In the clockwise direction the datasets are sorted by increas-
ing complexity. Notice in the part of the diagram where two-classes datasets are
ˆ
Search WWH ::




Custom Search