Information Technology Reference
In-Depth Information
While instance selection is well explored for general nearest neighbor
classification, see e.g. [ 1 , 6 , 19 , 24 , 32 ], there are only few works for the case
of time series. Xi et al. [ 57 ] presented the FastAWARD approach and showed that it
outperforms state-of-the-art, general-purpose instance selection techniques applied
for time-series.
FastAWARD follows an iterative procedure for discarding time series: in each
iteration, the rank of all the time series is calculated and the one with lowest rank
is discarded. Thus, each iteration corresponds to a particular number of kept time
series. Furthermore, Xi et al. argue that the optimal warping window size depends
on the number of kept time series. Therefore, FastAWARD calculates the optimal
warping window size dependent on the number of kept time series.
In this section, we present a hubness-aware instance selection technique which
was originally introduced in [ 9 ]. This approach is simpler and therefore compu-
tationally much cheaper than FastAWARD while it selects better instances, i.e.,
instances that allowmore accurate classification of time-series than the ones selected
by FastAWARD.
In [ 9 ] coverage graphs were proposed tomodel instance selection, and the instance
selection problem was formulated as finding the appropriate subset of vertices of
the coverage graph. Furthermore, it was shown that maximizing the coverage is
NP-complete in general. On the other hand, for the case of time-series classification,
a simple approach performed surprisingly well. This approach is called In stance
S elect i on based on G raph-coverage and H ubness for T ime-series or INSIGHT.
INSIGHT performs instance selection by assigning a score to each instance and
selecting instances with the highest scores (see Algorithm 4), therefore the “intel-
ligence” of INSIGHT is hidden in the applied score function. Next, we explain the
suitability of several score functions in the light of the hubness property.
Good 1-occurrence Score— INSIGHT can use scores that take into account how
many times an instance appears as good neighbor of other instances. Thus, a simple
score function is the good 1-occurrence score GN 1 (
x
)
.
Relative Score— While x is being a good hub, at the same time it may appear as
bad neighbor of several other instances. Thus, INSIGHT can also consider scores
that take bad occurrences into account. This leads to scores that relate the good
occurrence of an instance x to either its total occurrence or to its bad occurrence.
For simplicity, we focus on the following relative score , however, other variations
could be used too: relative score RS
of a time series x is the fraction of good
1-occurrences and total occurrences plus one (to avoid division by zero):
(
x
)
GN 1 (
x
)
RS
(
x
) =
1 .
(11.22)
N 1 (
x
) +
Xi's score— Notably, GN k (
)
and BN k (
)
allows us to interpret the ranking
criterion used by Xi et al. in FastAWARD [ 57 ] as another form of score for relative
hubness:
x
x
XI
(
x
) =
GN 1 (
x
)
2 BN 1 (
x
).
(11.23)
 
Search WWH ::




Custom Search