Information Technology Reference
In-Depth Information
As data testbed of the experiments, 13 multivariate datasets have been considered.
Eight of these datasets ( Heart, HeartStat, Australian, Ionosphere, Waveform, Seg-
ment, CoverType, Letter ) have been drawn from the UCI repository [ 31 ], selected
for their large number of instances, classes and features as it is appropriate when
testing ranking algorithms. Five more datasets ( Urban, Wildfire, Landslide, Corine,
Gottigen ) have been extracted from large geographic data collections. These datasets,
which include both discrete and continuous variables, are heterogeneous collections
of data, excellent to challenge the selective capability of our method and to high-
light the properties of the ranking model. The datasets: Urban, Wildfire, Landslide,
Corine originated from the same data collection, they differ from each other by a dif-
ferent feature chosen as class attribute. Urban, Wildfire and Landslide have balanced
classes, namely in these datasets all classes are represented by an equal number of
instances. The geographic dataset named Gottingen comes from a different collection
[ 4 ], its features correspond to Earth observation imagery from satellite on different
wavelength band. The characteristics of all the datasets are resumed in Table 4.2 ,
where the datasets are sorted by number of classes, then by number of features, and
by number of instances. Such a sorting also represents an increasing complexity of
dataset, ranging from a simple two-class perfectly balanced dataset with relatively
few instances, such is the Urban, up to the Corine dataset which is a 26 class large
dataset. All datasets have gone through a common preprocessing step where each
feature has been normalized in the range [0; 1], to give equal importance to each
feature during learning.
The first set of experiments aims to highlight how the FR algorithm perfor-
mance varies when different FE built-in algorithms are used. As already men-
tioned, the algorithm BVQ-FR can be transformed by changing the FE algorithm,
Table 4.2 Testbed datasets
Origin Dataset name # Classes # Features # Instances
UCI HeartStat 2 13 270
UCI Heart 2 13 293
UCI Australian 2 14 690
GIS Urban 2 18 3,972
GIS Wildfires 2 18 5,359
GIS Landslides 2 18 23,663
UCI Ionosphere 2 34 351
UCI Waveform 3 40 5,000
UCI CoverType 7 12 58,104
UCI Segment 7 19 2,310
GIS Gottigen 14 8 28,083
UCI Letter 26 16 20,000
GIS Corine 26 18 48,379
Datasets are sorted by number of classes ,by number of features , and finally by number of instances
 
Search WWH ::




Custom Search