Database Reference
In-Depth Information
points by a compression factor typically in the range from 10 to 600 for
real data [Keogh (1997)], outperforming methods based on the Discrete
Fourier Transform by one to three orders of magnitude [Keogh and Pazzani
(1999b)]. This approximation is shown to be valid under several distance
measures, including dynamic time warping distance [Keogh and Pazzani
(1999b)]. An enhanced representation is introduced in [Keogh and Pazzani
(1998)], where every line segment in the approximation is augmented with
a weight representing its relative importance; for instance, a combined
sequence may be constructed representing a class of sequences, and some
line segments may be more representative of the class than others.
4.3. Search Space Pruning through Subsequence Hashing
Keogh and Pazzani (1999a) describe an indexing method based on hashing,
in addition to the piecewise linear approximation. An equi-spaced template
grid window is moved across the sequence, and for each position a hash key
is generated to decide into which bin the corresponding subsequence is put.
The hash key is simply a binary string, where 1 means that the sequence
is predominantly increasing in the corresponding part of the template grid,
while 0 means that it is decreasing. These bin keys may then be used during
a search, to prune away entire bins without examining their contents. To get
more benefit from the bin pruning, the bins are arranged in a best-first order.
5. Conclusion
This chapter has sought to give an overview of recent advances in the field of
similarity based retrieval in time sequence databases. First, the problem of
similarity search and the desired properties of robust distance measures and
good indexing methods were outlined. Then, the general approach of signa-
ture based similarity search was described. Following the general descrip-
tion, three specific signature extraction approaches were discussed: Spectral
signatures, based on Fourier components (or wavelet components); piece-
wise constant approximation, and the related method adaptive piecewise
constant approximation; and landmark methods, based on the extraction of
significant points in a sequence. Finally, some methods that are not based
on signature extraction were mentioned.
Although the field of time sequence indexing has received much atten-
tion and is now a relatively mature field [Keogh et al . (2002)] there are still
areas where further research might be warranted. Two such areas are (1)
thorough empirical comparisons and (2) applications in data mining.
Search WWH ::




Custom Search