INDEXING TIME-SERIES UNDER CONDITIONS OF NOISE - Data Mining in Time Series Databases

Database Reference

In-Depth Information

(as before,

is the set of translations). Now we can show the following

lemma:

Lemma 4. Given trajectories A,B,C:

LCSS δ, 2 ε (

A, C

)

>LCSS δ,ε,F (

A, B

LCSS δ,ε,F (

B, C

)

−|B|

where

|B|

is the length of sequence B.

Proof: Clearly, if an element of

canmatchanelementof

within

, and the same element of

matches an element of

within

,then

the element of

can also match the element of

within 2

. Since there

are at least

|B|−

(

|B|−LCSS δ,ε,F (

A, B

))

−

(

|B|−LCSS δ,ε,F (

B, C

)) ele-

ments of

that match with elements of

andwithelementsof

,it

follows that

LCSS δ, 2 ε,F (

A, C

)

> |B|−

(

|B|−LCSS δ,ε,F (

A, B

))

−

(

|B|−

LCSS δ,ε,F (

B, C

)) =

LCSS δ,ε,F (

A, B

LCSS δ,ε,F (

B, C

)

−|B|

5.1. Indexing Structure

We first partition all the sequences into sets according to length, so that

the longest sequence in each set is at most

times the shortest (typically

we use

= 2.) We apply a hierarchical clustering algorithm on each set,

and we use the tree that the algorithm produced as follows:

For every node

M C )of

each cluster. The medoid is the sequence that has the minimum dis-

tance (or maximum LCSS ) from every other sequence in the clus-

ter: max vi∈C min vj∈C LCSS δ,ε,F (

ofthe reewe torethemedoid(

v i ,v j ,e

). So given the tree and a query

sequence

, we want to examine whether to follow the subtree that is

rooted at

. However, from the previous lemma we know that for any

sequence

LCSS δ,ε,F (

B, Q

)

< |B|

LCSS δ, 2 ε,F (

M C ,Q

)

− LCSS δ,ε,F (

M C ,B

)

or in terms of distance:

− LCSS δ,ε,F (

B, Q

)

δ, ε, B, Q

)=1

min(

|B|, |Q|

)

M c ,Q

|B|

) − LCSS δ, 2 ε,F (

)

−

min(

|B|, |Q|

min(

|B|, |Q|

)

+ LCSS δ,ε,F (

M c ,B

)

min(

|B|, |Q|

)

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home