Hubness-Aware Classification, Instance Selection and Feature Construction: Survey and Extensions to Time-Series - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

The representation of the data in a vector space allows the usage of any conventional

classifier. For our experiments, we trained logistic regression from the Weka soft-

ware package. 4 We used the mapped instances of

train

2

D

as training data for logistic

regression.

When classifying an instance x ∗ ∈ D

test ,wemap x ∗

into the same vector space

train

2

, i.e., we calculate the DTW-distances between x ∗ and the

as the instances of

D

x sel , l and we use these distances as features of x ∗ .

Once the features of x ∗ are calculated, we use the trained classifier (logistic regression

in our case) to classify x ∗ .

We used the 10-fold cross-validation to evaluate this feature construction-based

approach. Similarly to the case of INSIGHT and FastAWARD, the number of selected

instances corresponds to 10% of the entire training data, however, as described

previously, for the feature construction-based approach, we selected the instances

from

selected instances x sel , 1 ,

x sel , 2 ,...,

train

1

train ).

We tested several variants of the approach, for two out of them, the resulting

accuracies are shown in the last two columns of Table 11.5 . The results shown in

the fourth column of Table 11.5 (denoted as HubFeatures) refer to the case when we

performed hub-based instance selection on

D

(not from

D

train

1 using good 1-occurrence score.

The results shown in the last column of Table 11.5 (denoted as RndFeatures) refer to

the case when the instances were randomly selected from

D

train

1 .

Both HubFeatures and RndFeatures outperform FastAWARD in clear majority

of the cases: while they are significantly better than FastAWARD for 23 and 21

data sets respectively, they are significantly worse only for one data set. INSIGHT,

HubFeatures and RndFeatures can be considered as alternative approaches, as their

overall performances are close to each other. Therefore, in a real-world application,

one can use cross-validation to select the approach which best suits the particular

application.

D

11.7 Conclusions and Outlook

We devoted this chapter to the recently observed phenomenon of hubness which

characterizes numerous real-world data sets, especially high-dimensional ones. We

surveyed hubness-aware classifiers and instance selection. Finally, we proposed a

hubness-based feature construction approach. The approaches we reviewed were

originally published in various research papers using slightly different notations and

terminology. In this chapter, we presented all the approaches within an integrated

framework using uniformnotations and terminology. Hubness-aware classifiers were

originally developed for vector classification. Here, we pointed out that these classi-

fiers can be used for time-series classification given that an appropriate time-series

distance measure is present. To the best of our knowledge, most of the surveyed

approaches have not yet been used for time-series data. We performed extensive

experimental evaluation of the state-of-the-art hubness-aware classifiers on a large

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home