A SURVEY OF RECENT METHODS FOR EFFICIENT RETRIEVAL OF SIMILAR TIME SEQUENCES - Data Mining in Time Series Databases

Database Reference

In-Depth Information

The published methods have undergone thorough empirical tests that

evaluate their performance (usually by comparing them to sequential scan,

and, in some cases, to the basic spectral signature methods), but comparing

the results is not a trivial task—in most cases it might not even be very

meaningful, since variations in performance may be due to implementation

details, available hardware, and several other factors that may not be inher-

ent in the indexing methods themselves. Implementing several of the most

promising methods and testing them on real world problems (under similar

conditions) might lead to new insights, not only about their relative perfor-

mances in general, but also about which methods are best suited for which

problems. Although some comparisons have been made [such as in Wu et al .

(2000) and, in the more general context of spatial similarity search, in Weber

et al . (1998)], little research seems to have been published on this topic.

Data mining in time series databases is a relatively new field [Keogh

et al . (2002)]. Most current mining methods are based on a full, linear scan

of the sequence data. While this may seem unavoidable, constructing an

index of the data could make it possible to perform this full data traversal

only once, and later perform several data mining passes that only use the

index to perform their work. It has been argued that data mining should

be interactive [Das et al . (1998)], in which case such techniques could prove

useful. Some publications can be found about using time sequence indexing

for data mining purposes [such as Keogh et al . (2002), where a method is

presented for mining patterns using a sux tree index] but there is still a

potential for combining existing sequence mining techniques with existing

methods for similarity-based retrieval.

Appendix Distance Measures

Faloutsos et al . (1997) describe a general framework for sequence distance

measures [a similar framework can be found in Jagadish et al . (1995)].

They show that many common distance measures can be expressed in the

following form:

)=min min T 1 ,T 2 ∈ T {c

T 1 )+

c

T 2 )+

d

T 1 (

x

,T 2 (

y

}

(

)

))

d

(

x, y

(5)

d 0 (

x, y

.

)

T

is a set of allowable transformations,

c

(

T i )isthe cost of performing

the transformation

T i ,

T i (

x

) is the sequence resulting from performing the

transformation

d 0 is a so-called base distance , typically calcu-

lated in linear time. For instance,

T i on

x

,and

L p

norms (such as Manhattan distance

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home