Information Technology Reference
In-Depth Information
The representation of the data in a vector space allows the usage of any conventional
classifier. For our experiments, we trained logistic regression from the Weka soft-
ware package. 4 We used the mapped instances of
train
2
D
as training data for logistic
regression.
When classifying an instance x D
test ,wemap x
into the same vector space
train
2
, i.e., we calculate the DTW-distances between x and the
as the instances of
D
x sel , l and we use these distances as features of x .
Once the features of x are calculated, we use the trained classifier (logistic regression
in our case) to classify x .
We used the 10-fold cross-validation to evaluate this feature construction-based
approach. Similarly to the case of INSIGHT and FastAWARD, the number of selected
instances corresponds to 10% of the entire training data, however, as described
previously, for the feature construction-based approach, we selected the instances
from
selected instances x sel , 1 ,
x sel , 2 ,...,
train
1
train ).
We tested several variants of the approach, for two out of them, the resulting
accuracies are shown in the last two columns of Table 11.5 . The results shown in
the fourth column of Table 11.5 (denoted as HubFeatures) refer to the case when we
performed hub-based instance selection on
D
(not from
D
train
1 using good 1-occurrence score.
The results shown in the last column of Table 11.5 (denoted as RndFeatures) refer to
the case when the instances were randomly selected from
D
train
1 .
Both HubFeatures and RndFeatures outperform FastAWARD in clear majority
of the cases: while they are significantly better than FastAWARD for 23 and 21
data sets respectively, they are significantly worse only for one data set. INSIGHT,
HubFeatures and RndFeatures can be considered as alternative approaches, as their
overall performances are close to each other. Therefore, in a real-world application,
one can use cross-validation to select the approach which best suits the particular
application.
D
11.7 Conclusions and Outlook
We devoted this chapter to the recently observed phenomenon of hubness which
characterizes numerous real-world data sets, especially high-dimensional ones. We
surveyed hubness-aware classifiers and instance selection. Finally, we proposed a
hubness-based feature construction approach. The approaches we reviewed were
originally published in various research papers using slightly different notations and
terminology. In this chapter, we presented all the approaches within an integrated
framework using uniformnotations and terminology. Hubness-aware classifiers were
originally developed for vector classification. Here, we pointed out that these classi-
fiers can be used for time-series classification given that an appropriate time-series
distance measure is present. To the best of our knowledge, most of the surveyed
approaches have not yet been used for time-series data. We performed extensive
experimental evaluation of the state-of-the-art hubness-aware classifiers on a large
 
 
Search WWH ::




Custom Search