Database Reference
In-Depth Information
Table 2.
Running time between two sequences from the SEALS
dataset.
Running Time (sec)
Error(%)
forK=60
δ
E
Exact
Approximate for K tries
15
30
60
2
0.25
0.06
0.0038
0.006
0.00911
6.586
2
0.5
0.04
0.0032
0.00441
0.00921
4.343
4
0.25
0.17
0.0581
0.00822
0.01482
11.470
4
0.5
0.1
0.00481
0.00641
0.01031
9.699
6
0.25
0.331
0.00731
0.01151
0.01933
17.123
6
0.5
0.301
0.0062
0.00921
0.01332
22.597
6.2. Clustering using the Approximation Algorithm
We compare the clustering performance of our method to the widely used
Euclidean and DTW distance functions. Specifically:
(i) The Euclidean distance is only defined for sequences of the same length
(and the lengths of our sequences vary considerably). We tried to offer
the best possible comparison between every pair of sequences, by slid-
ing the shorter of the two sequences across the longer one and recording
their minimum distance. Like this we can get the best possible results
out of the Euclidean method, imposing some time overhead. However,
the computation of the Euclidean distance is the least complicated
one.
(ii) The DTW can be used to match sequences of different length. In both
DTW and Euclidean we normalized the data before computing the
distances. Our method does not need any normalization, since it com-
putes the necessary translations.
(iii) For LCSS we used a randomized version with and without sampling,
and for various values of
. The time and the correct clusterings repre-
sent the average values of 15 runs of the experiment. This is necessary
due to the randomized nature of our approach.
δ
6.2.1. Determining the Values for
δ
and
ε
The choice of values for the parameters
are clearly dependent on
the application and the dataset. For most datasets we had at our disposal
we discovered that setting
δ
and
ε
to more than 20-30% of the sequences length
did not yield significant improvement. Furthermore, after some point the
δ
Search WWH ::




Custom Search