INDEXING TIME-SERIES UNDER CONDITIONS OF NOISE - Data Mining in Time Series Databases

Database Reference

In-Depth Information

is small, the dynamic programming algorithm is very ecient.

However, for some applications

may need to be large. In this case, we

can speed-up the above computation using random sampling. Given two

sequences

, we compute two subsets RA and RB by sampling each

sequence. Then we use the dynamic programming algorithm to compute

the LCSS on RA and RB . We can show that, with high probability, the

result of the algorithm over the samples, is a good approximation of the

actual value. We describe this technique in detail in [40].

and

4.2. Computing the Similarity Function S 2

We now consider the more complex similarity function

2. Here, given

two sequences

A, B

, and constants

δ, ε

, we have to find the translation

f c

that maximizes the length of the longest common subsequence of

A, f c (

)

(

LCSS δ,ε (

)) over all possible translations.

A one dimensional translation

A, f c (

f c is a function that adds a constant to

all the elements of a 1-dimensional sequence:

f c (

x 1 ,...,x m )=(

x 1 +

c,...,

x m +

Let the length of sequences

and

respectively. Let us

also assume that the translation

f c 1

is the translation that, when applied

,and

it is also the translation that maximizes the length of the longest common

subsequence:

, gives a longest common subsequence

LCSS δ,ε (

A, f c 1 (

)) =

LCSS δ,ε (

A, f c 1 (

)) = max

c∈R LCSS δ,ε (

A, f c (

))

The key observation is that, although there is an infinite number of

translations that we can apply to

f c results to a longest

, each translation

common subsequence between

), and there is a finite set of

possible longest common subsequences. In this section we show that we can

eciently enumerate a finite set of translations, such that this set provably

includes a translation that maximizes the length of the longest common

subsequence of A and

and

f c (

c , applied to

A translation by

can be thought of as a linear transfor-

c . Such a transformation will allow

mation of the form

(

b i )=

b i +

b i

be matched to all

It is instructive to view this as a stabbing problem: Consider the

a j

for which

|i − j| <δ

,and

a j − ε ≤ f

(

b i )

≤ a j +

(

)) vertical line segments ((

b i ,a j −ε

)

(

b i ,a j +

)), where

|i−j| <δ

(Figure 10).

These line segments are on a two dimensional plane, where on the

axis

we put elements of

andonthe

axis we put elements of

. For every

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home