Database Reference
In-Depth Information
Similarity measures. Several researchers have defined similarity as
the distance between points in a feature space. For example, Caraca-Valente
and Lopez-Chavarrias (2000) used Euclidean distance between feature vec-
tors containing angle of knee movement and muscle strength, and Lee et al.
(2000) applied Euclidean distance to compare feature vectors containing
color, texture, and shape of video data. This technique works well when all
features have the same units of scale [Goldin and Kanellakis (1995)], but it
is often ineffective for combining disparate features.
An alternative definition of similarity is based on bounding rectan-
gles; two series are similar if their bounding rectangles are similar. It
allows fast pruning of clearly dissimilar series [Perng et al. (2000), Lee
et al. (2000)], but it is less effective for selecting the most similar series.
The envelope-count technique is based on dividing a series into short seg-
ments, called envelopes, and defining a yes/no similarity for each envelope.
Two series are similar within an envelope if their point-by-point differences
are within a certain threshold. The overall similarity is measured by the
number of envelopes where the series are similar [Agrawal et al. (1996)].
This measure allows fast computation of similarity, and it can be adapted
for noisy and missing data [Das et al. (1997), Bollobas et al. (1997)].
Finally, we can measure a point-by-point similarity of two series and
then aggregate these measures, which often requires interpolation of missing
points. For example, Keogh and Pazzani (1998) used linear interpolation
with this technique, and Perng et al. (2000) applied cubic approximation.
Keogh and Pazzani (2000) also described a point-by-point similarity with
modified Euclidean distance, which does not require interpolation.
Indexing and retrieval. Researchers have studied a variety of tech-
niques for indexing of time series. For example, Deng (1998) applied kd -
trees to arrange series by their significant features, Chan and Fu (1999)
combined wavelet transforms with R-trees, and Bozkaya and her col-
leagues used vantage-point trees for indexing series by numeric features
[Bozkaya et al. (1997), Bozkaya and Ozsoyoglu (1999)]. Park et al. (2001)
indexed series by their local extrema and by properties of the segments
between consecutive extrema. Li et al. (1998) proposed a retrieval technique
based on a multi-level abstraction hierarchy of features. Aggarwal and Yu
(2000) considered grid structures, but found that the grid performance is
often no better than exhaustive search. They also showed that exhaustive
search among compressed series is often faster than sophisticated indexing
techniques.
Search WWH ::




Custom Search