Information Technology Reference
In-Depth Information
large k valuesmight not make a lot of sense, as it would breach the locality assumption
by introducing neighbors into the k NN sets that are not relevant for the instance to be
classified. According to our experiments, HIKNN achieved very good performance
for k
∈[
5
,
15
]
, therefore, setting k
=
5or k
=
10 by default would usually lead to
reasonable results in practice.
11.6 Instance Selection and Feature Construction for
Time-Series Classification
In the previous section, we described four approaches that take hubness into account
in order to make time-series classification more accurate. In various applications,
however, besides classification accuracy, the classification time is also important.
Therefore, in this section, we present hubness-aware approaches for speeding-up
time-series classification. First, we describe instance selection for k NN classification
of time-series. Subsequently, we focus on feature construction.
11.6.1 Instance Selection for Speeding-Up Time-Series
Classification
Attempts to speed up DTW-based nearest neighbor classification fall into four major
categories: (i) speeding-up the calculation of the distance of two time series (by e.g.
limiting the warping window size), (ii) indexing, (iii) reducing the length of the time
series used, and (iv) instance selection. The first class of techniques was already
mentioned in Sect. 11.3 . For an overview of techniques for indexing and reduction
of the length of time-series and more advanced approaches for limiting the warping
window size, we refer to [ 8 ] and the references therein. In this section, we focus on
how to speed up time-series classification via instance selection.We note that instance
selection is orthogonal to the other speed-up techniques, i.e., instance selection can
be combined with those techniques in order to achieve highest efficiency.
Instance selection (also known as numerosity reduction or prototype selection )
aims at discarding most of the training time series while keeping only the most
informative ones, which are then used to classify unlabeled instances. In case of
conventional nearest neighbor classification, the instance to be classified, denoted
as x , will be compared to all the instances of the training data set. In contrast,
when applying instance selection, x will only be compared to the selected instances
of the training data. For time-series classification, despite the techniques aiming at
speeding-up DTW-calculations, the calculation of the DTWdistance is still relatively
expensive computationally, therefore, when selecting a relatively small number of
instances, such as 10% of the training data, instance selection can substantially
speed-up the classification of time-series.
 
Search WWH ::




Custom Search