Database Reference
In-Depth Information
clusters in the datasets (Figure 21). As the number of clusters increased
the performance of the algorithm improved considerably. This behavior is
expected and it is similar to the behavior of recent proposed index structures
for high dimensional data [6,9,22]. On the other hand if the dataset has no
clusters, the performance of the algorithm degrades, since the majority of
the sequences have almost the same distance to the query. This behavior
follows again the same pattern of high dimensional indexing methods [6,43].
7. Conclusions
We have presented ecient techniques to accurately compute the simi-
larity between time-series with significant amount of noise. Our distance
measure is based on the LCSS model and performs very well for noisy sig-
nals. Since the exact computation is inecient, we presented approximate
algorithms with provable performance bounds. Moreover, we presented an
ecient index structure, which is based on hierarchical clustering, for sim-
ilarity (nearest neighbor) queries. The distance that we use is not a metric
and therefore the triangle inequality does not hold. However, we prove that
a similar inequality holds (although a weaker one) that allows to prune
parts of the datasets without any false dismissals.
Our experiments indicate that the approximation algorithm can be used
to get an accurate and fast estimation of the distance between two time-
series even under noisy conditions. Also, results from the index evaluation
show that we can achieve good speed-ups for searching similar sequences
comparing with the brute force linear scan.
We plan to investigate biased sampling to improve the running time of
the approximation algorithms, especially when full rigid transformations
(e.g. shifting, scaling and rotation) are necessary. Another approach to
index time-series for similarity retrieval is to use embeddings and map the
set of sequences to points in a low dimensional Euclidean space [15]. The
challenge of course is to find an embedding that approximately preserves the
original pairwise distances and gives good approximate results to similarity
queries.
References
1. Agarwal, P.K., Arge, L., and Erickson, J. (2000). Indexing Moving Points. In
Proc. of the 19th ACM Symp. on Principles of Database Systems (PODS) ,
pp. 175-186.
2. Agrawal, R., Faloutsos, C., and Swami, A. (1993). Ecient Similarity Search
in Sequence Databases. In Proc.ofthe4thFODO , pp. 69-84.
Search WWH ::




Custom Search