INDEXING TIME-SERIES UNDER CONDITIONS OF NOISE - Data Mining in Time Series Databases - page 84

Database Reference

In-Depth Information

Fig. 11. If we can afford to be within a certain error range then we don't have to try

all translations.

We can now prove our main theorem:

Theorem 2. Given two sequences A and B, with

|A|

=

n

and

|B|

=

m

(suppose

m<n

), and a constant 0

<β<

1 , we can find an approxima-

tion

AS

2 δ,β (

A, B

) of the similarity

S

2(

δ, ε, A, B

) such that

S

2(

δ, ε, A, B

)

−

nδ 2 /β

AS

2 δ,β (

A, B

)

<β

in

O

(

) time.

Proof: Let

a

=

S

2(

δ, ε, A, B

). There exists a translation

f i

such that

L fi

is a superset of the matches in the optimal LCSS of

A

and

B

. In addition,

by the previous lemma, there are 2

b

translations (

f i−b ,...,f i + b )thathave

at most

different matchings from the optimal.

Therefore, if we use the translations

b

,..., δ b

in the order-

f ib ,for

i

=1

ing described above, we are within

b

different matchings from the opti-

mal matching of

A

and

B

(Figure 11). We can find these translations in

O

(

δn

) time if we find and sort all the translations.

Alternatively, we can find these translations in

log

n

( δ b δn

O

)timeifwerun

[ δ b

] quantile operations.

So we have a total of δ b

translations and setting

b

=

βn

completes the

proof.

A, B

n, m

m<n

Given sequences

with lengths

respectively (

), and con-

stants

δ, β, ε

, the approximation algorithm works as follows:

(i) Find all sets of translations for sequences A and B.

Next Page

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home