Information Technology Reference
In-Depth Information
Table 4.3 EDBFM ranking: comparison of filter FE algorithms
OLDA-FR (
ˆ
)
BVQ-FR (
ˆ
)
ADBFE-FR (
ˆ
)
SVM-FR (
ˆ
)
HeartStat
0.116
0.411
0.015
0.188
Heart
0.187
0.471
-
0.144
Australian
0.685
0.463
0.180
0.298
Urban
0.543
0.465
0.502
0.075
Wildfires
0.474
0.354
0.277
0.265
Landslides
0.400
0.388
0.224
0.446
Ionosphere
0.096
0.207
0.019
0.017
Waveform
0.651
0.670
0.641
-
CoverType
0.125
0.373
-
-
Segment
0.348
0.595
-
-
Gottigen
0.445
0.456
-
0.447
Letter
0.133
0.287
-
-
Corine
0.484
0.592
-
-
The Performance Index (
ˆ
) for each of the accuracy curves in Fig. 4.4
concentrated, from Heart to Ionosphere, there is an evident superiority of One Rule
over the other rankers. By contrast where more complex datasets are concentrated,
from Waveform to Corine, BVQ-FR tends to outperform the other rankers whose
performance decreases more rapidly as the dataset complexity increases.
Another comparative indicator of performance is the number of features needed
to reach 90% of total accuracy, see Table 4.5 . This indicator represents a relative
measure of the steepness of the curve; it indicates the ranker's ability to lead to higher
accuracies with relatively small subsets. On this indicator BVQ-FR outperforms all
other rankers.
Let us observe inmore detail a rankingmodel to highlight its usefulness in support-
ing cost-benefit informed decision making. In Fig. 4.7 left, for the Wildfire dataset,
the curve of accuracy obtained for BVQ-FR is overlaid with the curve of cumulative
weights, the horizontal axis represents the features sorted by rank. Notice that the first
nine features, which are 50% of total, represent half the cost of the entire dataset, but
detain over 70% of the total weight and over 98% of the total accuracy achievable.
Analogously, in Fig. 4.7 right, for CoverType dataset, the first feature holds 17% of
the total weight of the features, whereas the first six features (50% of total features)
detain over 70% of the total weight and over 80% of the accuracy achievable on the
full dataset. If the individual costs of the features are given, it is possible to construct
a detailed cost function. As a consequence, it is evident that the proposed methodol-
ogy can guarantee the best ratio between cost of features acquisition and informative
power.
As it was described above, the index
has been used to compare the relative
performance of ranking algorithms on a dataset. To assess the overall performance
for each algorithm the number of times that the algorithm has had the highest
ˆ
ˆ
was counted. These aleatory results, however, require a test of statistical significance
 
Search WWH ::




Custom Search