INDEXING TIME-SERIES UNDER CONDITIONS OF NOISE - Data Mining in Time Series Databases

Database Reference

In-Depth Information

In order to provide a lower bound we have to maximize the expression

|B|−LCSS δ,ε,F (

M c ,B

). Therefore, for every node of the tree along with

themedoidwehavetokeepthesequence

r c that maximizes this expression.

If the length of the query is smaller than the shortest length of the sequences

we are currently considering we use that, otherwise we use the minimum

and maximum lengths to obtain an approximate result.

5.2. Searching the Index Tree for Nearest Trajectories

We assume that we search an index tree that contains time-series with

minimum length, min

. For simplicity we dis-

cuss the algorithm for the 1-Nearest Neighbor query, where given a query

sequence

l

and maximum length, max

l

Q

we try to find the sequence in the set that is the most similar

Q

N

to

. The search procedure takes as input a node

in the tree, the query

Q

and the distance to the closest time-series found so far. For each of the

children

, we check if the child is a single sequence or a cluster. In case

that it is a sequence, we just compare its distance to

C

with the current

nearest sequence. If it is a cluster, we check the length of the query and we

choose the appropriate value for min(

Q

|B|, |Q|

). Then we compute a lower

bound

to the distance of the query with any sequence in the cluster and

we compare the result with the distance of the current nearest neighbor

mindist . We need to examine this cluster only if

L

is smaller than mindist .

In our scheme we use an approximate algorithm to compute the

LCSS δ,ε,F . Consequently, the value of

L

LCSS δ,ε,F (

M C ,B

)

/

min(

|B|, |Q|

)

that we compute can be up to

times higher than the exact value. There-

fore, since we use the approximate algorithm of section 3.2 for indexing

trajectories, we have to subtract

β

β ∗ min(

|M C |, |Q|

)

/

min(

|B|, |Q|

)fromthe

bound we compute for

D

2(

δ, ε, B, Q

).

6. Experimental Evaluation

We implemented the proposed approximation and indexing techniques as

they are described in the previous sections and here we present experimen-

tal results evaluating our techniques. We describe the datasets and then

we continue by presenting the results. The purpose of our experiments is

twofold: first, to evaluate the eciency and accuracy of the approximation

algorithm presented in Section 4 and second to evaluate the indexing tech-

nique that we discussed in the previous section. Our experiments were run

on a PC Pentium III at 1 GHz with 128 MB RAM and 60 GB hard disk.

Data Mining in Time Series Databases

Search WWH ::

Custom Search

Home