INDEXING TIME-SERIES UNDER CONDITIONS OF NOISE - Data Mining in Time Series Databases

Database Reference

In-Depth Information

clusters in the datasets (Figure 21). As the number of clusters increased

the performance of the algorithm improved considerably. This behavior is

expected and it is similar to the behavior of recent proposed index structures

for high dimensional data [6,9,22]. On the other hand if the dataset has no

clusters, the performance of the algorithm degrades, since the majority of

the sequences have almost the same distance to the query. This behavior

follows again the same pattern of high dimensional indexing methods [6,43].

7. Conclusions

We have presented ecient techniques to accurately compute the simi-

larity between time-series with significant amount of noise. Our distance

measure is based on the LCSS model and performs very well for noisy sig-

nals. Since the exact computation is inecient, we presented approximate

algorithms with provable performance bounds. Moreover, we presented an

ecient index structure, which is based on hierarchical clustering, for sim-

ilarity (nearest neighbor) queries. The distance that we use is not a metric

and therefore the triangle inequality does not hold. However, we prove that

a similar inequality holds (although a weaker one) that allows to prune

parts of the datasets without any false dismissals.

Our experiments indicate that the approximation algorithm can be used

to get an accurate and fast estimation of the distance between two time-

series even under noisy conditions. Also, results from the index evaluation

show that we can achieve good speed-ups for searching similar sequences

comparing with the brute force linear scan.

We plan to investigate biased sampling to improve the running time of

the approximation algorithms, especially when full rigid transformations

(e.g. shifting, scaling and rotation) are necessary. Another approach to

index time-series for similarity retrieval is to use embeddings and map the

set of sequences to points in a low dimensional Euclidean space [15]. The

challenge of course is to find an embedding that approximately preserves the

original pairwise distances and gives good approximate results to similarity

queries.

References

1. Agarwal, P.K., Arge, L., and Erickson, J. (2000). Indexing Moving Points. In

Proc. of the 19th ACM Symp. on Principles of Database Systems (PODS) ,

pp. 175-186.

2. Agrawal, R., Faloutsos, C., and Swami, A. (1993). Ecient Similarity Search

in Sequence Databases. In Proc.ofthe4thFODO , pp. 69-84.

Search WWH ::

Custom Search

Home