A SURVEY OF RECENT METHODS FOR EFFICIENT RETRIEVAL OF SIMILAR TIME SEQUENCES - Data Mining in Time Series Databases

Database Reference

In-Depth Information

•

Finding out whether a musical score is similar to one of a set of copy-

righted scores.

•

Finding portions of seismic waves that are not similar to spot geological

irregularities.

Applications range from medicine, through economy, to scientific disci-

plines such as meteorology and astrophysics [Faloutsos et al. (1994), Yi and

Faloutsos (2000)].

The running times of simple algorithms for comparing time sequences

are generally polynomial in the length of both sequences, typically linear or

quadratic. To find the correct offset of a query in a large database, a naive

sequential scan will require a number of such comparisons that is linear in

the length of the database. This means that, given a query of length

m

and

a database of length

n

, the search will have a time complexity of O(nm) ,

nm 2 ) or worse. For large databases this is clearly unacceptable.

Many methods are known for performing this sort of query in the domain

of strings over finite alphabets, but with time sequences there are a few extra

issues to deal with:

or even

O

(

•

The range of values is not generally finite, or even discrete.

•

The sampling rate may not be constant.

•

The presence of noise in various forms makes it necessary to support very

flexible similarity measures.

This chapter describes some of the recent advances that have been made

in this field; methods that allow for indexing of time sequences using flexible

similarity measures that are invariant under a wide range of transformations

and error sources.

The chapter is structured as follows: Section 2 gives a more formal

presentation of the problem of similarity-based retrieval and the so-called

dimensionality curse; Section 3 describes the general approach of signature

based retrieval, or shrink and search, as well as three specific methods using

this approach; Section 4 shows some other approaches, while Section 5

concludes the chapter. Finally, Appendix gives an overview of some basic

distance measures. 1

1 The term “distance” is used loosely in this paper. A distance measure is simply the

inverse of a similarity measure and is not required to obey the metric axioms.

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home