Database Reference
In-Depth Information
in all dimensions, usually satisfied through the somewhat stricter require-
ment of the triangle inequality (
d
x
z
≤ d
x, y
d
y
z
)).
One important problem that occurs when trying to index sequences with
spatial access methods is the so-called dimensionality curse : Spatial indices
typically work only when the number of dimensions is low [Chakrabarti
and Mehrotra (1999)]. This makes it unfeasible to code the entire sequence
directly as a vector in an indexed space.
The general solution to this problem is dimensionality reduction :to
condense the original sequences into signatures in a signature space of low
dimensionality, in a manner which, to some extent, preserves the distances
between them. One can then index the signature space.
(
,
)
(
)+
(
,
3. Signature Based Similarity Search
x
n
A time sequence
of length
can be considered a vector or point in an
n
-dimensional space. Techniques exist (spatial access methods, such as the
R
-tree and variants [Chakrabarti and Mehrotra (1999), Wang and Perng
(2001), Sellis et al. (1987)] for indexing such data. The problem is that
the performance of such methods degrades considerably even for relatively
low dimensionalities [Chakrabarti and Mehrotra (1999)]; the number of
dimensions that can be handled is usually several orders of magnitude lower
than the number of data points in a typical time sequence.
A general solution described by Faloutsos et al. (1994; 1997) is to extract
a low-dimensional signature from each sequence, and to index the signature
space. This shrink and search approach is illustrated in Figure 2.
Fig. 2.
The signature based approach.
 
Search WWH ::




Custom Search