Mining Sequential Patterns from Probabilistic Databases by Pattern-Growth - Advances in Databases - page 120

Database Reference

In-Depth Information

such that s k ⊆ t i j ,for k =1 ,...,q .The source sequence D i corresponding to a

source i is just the multiset

, ordered by eid . For any sequence

s , define its support in D , denoted Sup ( s, D ) is the number of sources i such

that s D i . The objective is to find all sequences s such that Sup ( s, D )

{e|

( eid, e, i )

∈ D}

≥ θm

for some user-defined threshold 0 <θ≤

1.

Probabilistic Databases. We define an SLU probabilistic database D p to be an

ordered list

of records of the form ( eid ,e, W )where eid is an event-id,

e is an event and W is a probability distribution over

r 1 ,...,r n

; the list is ordered by eid .

The distribution W contains pairs of the form ( σ, c ), where σ ∈S

S

and 0 <c≤

1

is the confidence that the event e is associated with source σ and ( σ,c ) ∈W c =1.

An example can be found in Table 1(L). The possible worlds semantics of D p

is as follows. A possible world D ∗ of D p is generated by taking each event e i in

turn, and assigning it to one of the possible sources σ i ∈ W i . Thus every record

r i =( eid i ,e i ,W i )

in D ∗ .

The complete set of possible worlds is obtained by enumerating all such possible

combinations. We assume that the distributions W i associated with each record

r i

∈ D p takes the form r i =( eid i ,e i ,σ i ), for some σ i ∈S

in D p are stochastically independent; the probability of a possible world D ∗

is therefore Pr[ D ∗ ]= i =1 Pr W i [ σ i ]. For example, a possible world D ∗ for the

database of Table 1 can be generated by assigning event e 1 to Z with probability

0 . 3, events e 2 and e 4 to X with probabilities 0 . 7and0 . 6 respectively, and event

e 3 to Y with probability 0 . 3, and Pr[ D ∗ ]=0 . 3

×

0 . 7

×

0 . 3

×

0 . 6=0 . 0378.

Table 1. An SLU event database (L) transformed to p-sequences (R). Note that the

events like e 1 (marked with † on (R)) can only be associated with one of the sources

X, Y and Z in any possible world

eid event

W

p-sequence

D X ( a, c, e :0 . 1) † ( b, c, d :0 . 7)( a, d, e :0 . 2)

( b, c, e :0 . 6)

D Y ( a, c, e :0 . 6) † ( b, c, d :0 . 3)( a, d, e :0 . 3)

D Z ( a, c, e :0 . 3) † ( a, d, e :0 . 5)( b, c, e :0 . 4)

e 1

( a, c, e ) ( X :0 . 1)( Y :0 . 6)( Z :0 . 3)

e 2

( b, c, d )

( X :0 . 7)( Y :0 . 3)

e 3

( a, d, e ) ( X :0 . 2)( Y :0 . 3)( Z :0 . 5)

e 4

( b, c, e )

( X :0 . 6)( Z :0 . 4)

As a possible world is a deterministic instance of a given probabilistic database,

concepts like the support of a sequence in a possible world do apply. The expected

support of a sequence s in D p is computed as follows:

ES ( s, D p )=

Pr[ D ∗ ]

Sup ( s, D ∗ ) ,

∗

(1)

D ∗ ∈PW ( D p )

The problem we consider is:

Given an SLU probabilistic database D p , determine all sequences s such

that ES ( s, D p )

≥ θm , for some user-specified threshold θ ,0 <θ≤

1.

Next Page

Advances in Databases

Search WWH ::

Custom Search

Home