Mining Sequential Patterns from Probabilistic Databases by Pattern-Growth - Advances in Databases

Database Reference

In-Depth Information

Table 2. An example of computing the ES of all S-extensions of s =( a )forsource

X in the sample database of Table 1. The gray row is the B X,s array. In the bottom

half, the cells that changed in F after processing the corresponding event are marked

as gray. After processing source X , values in the final column are updated to G .

D X

( a, c, e :0 . 1)

( b, c, d :0 . 7)

( a, d, e :0 . 2)

( b, c, e :0 . 6)

(a)

0.1

0.28

(a)

0.0

0.02

0.020

(b)

0.0

0.07

0.196

(c)

0.0

0.07

0.196

(d)

0.0

0.07

0.076

(e)

0.0

0.02

0.176

(i)

(ii)

(iii)

(iv)

Objective: To compute the ES of all the S- or I-extensions of s in one pass

over the database, and thus discover all frequent extensions of s .

Steps: We consider the two cases of finding the S- and I-extensions of s in turn.

We compute the frequent S-extensions of s as follows: Let i be a source, D i

( e 1 ,c 1 ) ,..., ( e r ,c r )

,and s =

s 1 ,...,s q

be any sequence. Initialize two arrays

F and G ,eachofsize q =

to zero and consider each source i in turn. Then

scan B i,s up-to the first non-zero entry e k ,andforeveryitem x in e , k<≤ r ,

update F [ x ] as follows:

|I|

F [ x ] := ((1

∗ F [ x ]) + ( c ∗ B i,s [ −

− c )

1])

(5)

We keep track of all the non-zero entries in F [ x ], and once finished with source

i ,update G [ x ]:= G [ x ]+ F [ x ]andreset F [ x ] to zero. After all the sources i

are processed, all the entries in G [ x ]

≥ θm are frequent S-extensions of s .An

example of this computation is shown in Table 2.

For the I-extensions case, the initializations are the same as for the S-extensions

case. For a sequence s = s 1 ,...,s q and source i , when scanning B i,s up-to the

first non-zero entry e which means s q ⊆ e , for every item x in e that is not in

s q

and is lexicographically greater than all items in s q ,update F [ x ] as follows:

F [ x ]:=(1 − c ) ∗ F [ x ]+( B i,s [ ] − B i,s [ − 1] ∗ (1 − c )) ,

(6)

and apply Eq. 6 to all the events e , ≤ r ,where s q ⊆ e (or alternatively where

the B i,s

values change). After all the sources i are processed, all the entries in

G [ x ]

≥ θm are frequent I-extensions of s .

Pattern-Growth Algorithm. An overview of our pattern-growth algorithm

is in Fig. 1. We first compute the set of frequent 1-sequences, L 1 (Line 3) (as-

sume L 1 is in ascending order). For each 1-sequence x , first we compute the B i,x

Advances in Databases

Search WWH ::

Custom Search

Home