Information Technology Reference
In-Depth Information
Algorithm 4 INSIGHT
Require: Time series dataset D , Score Function g ( x )
/* e.g. one of GN 1 ( x ) , RS ( x ) or XI ( x ) */,
Number of selected instances n sel
Ensure: Set of selected instances (time series) D
1: Calculate score function g ( x ) for all x D
2: Sort all the time series in D according to their scores g ( x )
3: Select the top-ranked n sel time series and return the set containing them
As reported in [ 9 ], INSIGHT outperforms FastAWARD both in terms of classi-
fication accuracy and execution time. The second and third columns of Table 11.5
present the average accuracy and corresponding standard deviation for each data set,
for the case when the number of selected instances is equal to 10% of the size of
the training set. The experiments were performed according to the 10-fold cross-
validation protocol. For INSIGHT, the good 1-occurrence score is used, but we note
that similar results were achieved for the other scores too.
In clear majority of the cases, INSIGHT substantially outperformed FastAWARD.
In the few remaining cases, their differences are remarkably small (which are not
significant in the light of the corresponding standard deviations). According to the
analysis reported in [ 9 ], one of the major reasons for the suboptimal performance
of FastAWARD is that the skewness degrades during the FastAWARD's iterative
instance selection procedure, and therefore FastAWARD is not able to select the best
instances in the end. This is crucial because FastAWARD discards the worst instance
in each iteration and therefore the final iterations have substantial impact on which
instances remain, i.e., which instances will be selected by FastAWARD.
11.6.2 Feature Construction
As shown in Sect. 11.6.1 , the instance selection approach focusing on good hubs leads
to overall good results. Previously, once the instances were selected, we simply used
themas training data for the k NN classifier. In amore advanced classification schema,
instead of simply performing nearest neighbor classification, we can use distances
from selected instances as features. This is described in detail below.
First, we split the training data
train
train
1
train
2
D
into two disjoint subsets
D
and
D
,
train
1
train
2
train
1
train
2
train . We select some instances from
i.e.,
D
D
=∅
,
D
D
= D
train
1
D
, denote these selected instances as x sel , 1 ,
x sel , 2 ,...,
x sel , l . For each instance
train
D
x
2 , we calculate its DTW-distance from the selected instances and use these
distances as features of x . This way, we map each instance x
train
2
D
into a vector
space:
x mapped = d DT W (
x sel , l ) .
x
,
x sel , 1 ),
d DT W (
x
,
x sel , 2 ),...,
d DT W (
x
,
(11.24)
 
 
Search WWH ::




Custom Search