INDEXING TIME-SERIES UNDER CONDITIONS OF NOISE - Data Mining in Time Series Databases

Database Reference

In-Depth Information

Table 2.

Running time between two sequences from the SEALS

dataset.

Running Time (sec)

Error(%)

forK=60

δ

E

Exact

Approximate for K tries

15

30

60

2

0.25

0.06

0.0038

0.006

0.00911

6.586

2

0.5

0.04

0.0032

0.00441

0.00921

4.343

4

0.25

0.17

0.0581

0.00822

0.01482

11.470

4

0.5

0.1

0.00481

0.00641

0.01031

9.699

6

0.25

0.331

0.00731

0.01151

0.01933

17.123

6

0.5

0.301

0.0062

0.00921

0.01332

22.597

6.2. Clustering using the Approximation Algorithm

We compare the clustering performance of our method to the widely used

Euclidean and DTW distance functions. Specifically:

(i) The Euclidean distance is only defined for sequences of the same length

(and the lengths of our sequences vary considerably). We tried to offer

the best possible comparison between every pair of sequences, by slid-

ing the shorter of the two sequences across the longer one and recording

their minimum distance. Like this we can get the best possible results

out of the Euclidean method, imposing some time overhead. However,

the computation of the Euclidean distance is the least complicated

one.

(ii) The DTW can be used to match sequences of different length. In both

DTW and Euclidean we normalized the data before computing the

distances. Our method does not need any normalization, since it com-

putes the necessary translations.

(iii) For LCSS we used a randomized version with and without sampling,

and for various values of

. The time and the correct clusterings repre-

sent the average values of 15 runs of the experiment. This is necessary

due to the randomized nature of our approach.

δ

6.2.1. Determining the Values for

δ

and

ε

The choice of values for the parameters

are clearly dependent on

the application and the dataset. For most datasets we had at our disposal

we discovered that setting

δ

and

ε

to more than 20-30% of the sequences length

did not yield significant improvement. Furthermore, after some point the

δ

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home