Mining Sequential Patterns from Probabilistic Databases by Pattern-Growth - Advances in Databases

Database Reference

In-Depth Information

Observe that it is not feasible to use Eq. 1 directly due to the exponential number

of possible worlds. Consider for example, we want to compute the probability

with which source X supports a sequence s =( a )( b ) in the sample database

of Table 1. There can be two ways in which s can be supported by source X

i.e. either e 1 does support ( a ) and at least one of the events e 2 or e 4 support

( b ) or alternatively, e 1 does not support ( a ) but e 3 does support ( a ), and e 4

supports ( b ). The probability that X supports s is calculated as 0 . 196. Clearly,

computing the source support probability this way is an expensive operation. In

[12], a dynamic programming (DP) based algorithm was proposed to compute

the source support probability Pr[ s D i

], and it was shown that the ES of a

sequence could be computed as follows:

ES ( s, D p )= m

Pr[ s D i

]

(2)

3 Pattern-Growth Approach

We now describe our Pattern-Growth-Approach (PGA) that uses in essence

the already proposed sub-routines, Dynamic Programming (DP) and fast L 1

computation [12]; and integrates it with the pattern-growth mechanism [13]. We

first review some concepts:

p-Sequences. A p-sequence is analogous to a source sequence in classical SPM,

and is a sequence of the form

( e 1 ,c 1 ) ... ( e k ,c k )

,where e j

is an event and c j

is a confidence value. An SLU database D p

can be viewed as a collection of

p-sequences D 1 ,...,D p m

,where D i

is the p-sequence of source i , and contains a

list of those events in D p that have non-zero confidence of being associated with

source i , ordered by eid , together with the associated confidence (see Table 1(R)).

Further, given a sequence s =

and an item x , s can either be extended

by adding x as a separate element in s , s·{x}

s 1 ,...,s q

(S-extension) or by appending x to

the last element in s ,

(I-extension). For example, for s =( a )( b )

and x = c , S- and I-extensions of s are ( a )( b )( c )and( a )( b, c ) respectively.

Dynamic Programming. Let i be a source, D i

s 1 ,...,s q ∪{x}

( e 1 ,c 1 ) ,..., ( e r ,c r )

,and

s =

be any sequence. Now let A i,s be the ( q × r )DPmatrixusedto

compute Pr[ s D i

s 1 ,...,s q

], and let B i,s

denote the last row of A i,s ,thatis, B i,s [ ]=

A i,s [ q, ]for =1 ,...,r .For1

≤ k ≤ q and 1

≤ ≤ r , A [ k, ] will contain

], so A [ q, r ]isthevaluePr[ s D i

Pr[

s 1 ,...,s k

( e 1 ,c 1 ) ,..., ( e ,c )

]. We

set A [1 , ] = 1 for all ,1

≤ ≤ r and A [ k, 1] = 0 for all 1

≤ k ≤ q , and compute

the other values row-by-row. For 1

≤ k ≤ q and 1

≤ ≤ r , define:

= c if s k ⊆ e

0otherwise ,

c k

(3)

where c k

is the probability that the element s k

is contained in the event e

source i ;If s k ⊆ e , c k

is equal to the probability that e

is associated with

source i , and 0 otherwise. Next, use the following recurrence:

Advances in Databases

Search WWH ::

Custom Search

Home