Database Reference
In-Depth Information
Fig. 7.
A sequence reconstructed from a spectral signature.
that makes subsequence matching possible. The main steps of the approach
are as follows:
1. For each position in the database, extract a window of length
w
,and
create a spectral signature (a point )forit.
Each point will be close to the previous, because the contents of the
sliding window change slowly. The points for one sequence will therefore
constitute a trail in signature space.
2. Partition the trails into suitable (multidimensional) Minimal Bounding
Rectangles (MBRs), according to some heuristic.
3. Store the MBRs in a spatial index structure.
To search for subsequences similar to a query
q
of length
w
,simply
look up all MBRs that intersect a hypersphere with radius
ε
around the
signature point ˜
. This is guaranteed not to produce any false dismissals,
because if a point is within a radius of
q
ε
q
, it cannot possibly be contained
in an MBR that does not intersect the hypersphere.
To search for sequences longer than
of ˜
-length
segments, search for each of them, and intersect the result sets. Because
a sequence in the result set R cannot be closer to the full query sequence
than it is to any one of the window signatures, it has to be close to all of
them, that is, contained in all the result sets.
These two papers [Agrawal et al. (1993) and Faloutsos et al . (1994)]
are seminal; several newer approaches are based on them. For example,
Rafiei and Mendelzon (1997) show how the method can be made more
robust by allowing various transformations in the comparison, and Chan
and Fu (1999) show how the Discrete Wavelet Transform (DWT) can be
used instead of the Discrete Fourier Transform (DFT), and that the DWT
method is empirically superior. See Wu et al. (2000) for a comparison
between similarity search based on DFT and DWT.
w
, split the query into
w
Search WWH ::




Custom Search