Database Reference
In-Depth Information
The published methods have undergone thorough empirical tests that
evaluate their performance (usually by comparing them to sequential scan,
and, in some cases, to the basic spectral signature methods), but comparing
the results is not a trivial task—in most cases it might not even be very
meaningful, since variations in performance may be due to implementation
details, available hardware, and several other factors that may not be inher-
ent in the indexing methods themselves. Implementing several of the most
promising methods and testing them on real world problems (under similar
conditions) might lead to new insights, not only about their relative perfor-
mances in general, but also about which methods are best suited for which
problems. Although some comparisons have been made [such as in Wu et al .
(2000) and, in the more general context of spatial similarity search, in Weber
et al . (1998)], little research seems to have been published on this topic.
Data mining in time series databases is a relatively new field [Keogh
et al . (2002)]. Most current mining methods are based on a full, linear scan
of the sequence data. While this may seem unavoidable, constructing an
index of the data could make it possible to perform this full data traversal
only once, and later perform several data mining passes that only use the
index to perform their work. It has been argued that data mining should
be interactive [Das et al . (1998)], in which case such techniques could prove
useful. Some publications can be found about using time sequence indexing
for data mining purposes [such as Keogh et al . (2002), where a method is
presented for mining patterns using a sux tree index] but there is still a
potential for combining existing sequence mining techniques with existing
methods for similarity-based retrieval.
Appendix Distance Measures
Faloutsos et al . (1997) describe a general framework for sequence distance
measures [a similar framework can be found in Jagadish et al . (1995)].
They show that many common distance measures can be expressed in the
following form:
)=min min T 1 ,T 2 T {c
T 1 )+
c
T 2 )+
d
T 1 (
x
,T 2 (
y
}
(
(
(
)
))
d
(
x, y
(5)
d 0 (
x, y
.
)
T
is a set of allowable transformations,
c
(
T i )isthe cost of performing
the transformation
T i ,
T i (
x
) is the sequence resulting from performing the
transformation
d 0 is a so-called base distance , typically calcu-
lated in linear time. For instance,
T i on
x
,and
L p
norms (such as Manhattan distance
Search WWH ::




Custom Search