Database Reference
In-Depth Information
such that s k ⊆ t i j ,for k =1 ,...,q .The source sequence D i corresponding to a
source i is just the multiset
, ordered by eid . For any sequence
s , define its support in D , denoted Sup ( s, D ) is the number of sources i such
that s D i . The objective is to find all sequences s such that Sup ( s, D )
{e|
( eid, e, i )
∈ D}
≥ θm
for some user-defined threshold 0 <θ≤
1.
Probabilistic Databases. We define an SLU probabilistic database D p to be an
ordered list
of records of the form ( eid ,e, W )where eid is an event-id,
e is an event and W is a probability distribution over
r 1 ,...,r n
; the list is ordered by eid .
The distribution W contains pairs of the form ( σ, c ), where σ ∈S
S
and 0 <c≤
1
is the confidence that the event e is associated with source σ and ( σ,c ) ∈W c =1.
An example can be found in Table 1(L). The possible worlds semantics of D p
is as follows. A possible world D of D p is generated by taking each event e i in
turn, and assigning it to one of the possible sources σ i ∈ W i . Thus every record
r i =( eid i ,e i ,W i )
in D .
The complete set of possible worlds is obtained by enumerating all such possible
combinations. We assume that the distributions W i associated with each record
r i
∈ D p takes the form r i =( eid i ,e i i ), for some σ i ∈S
in D p are stochastically independent; the probability of a possible world D
is therefore Pr[ D ]= i =1 Pr W i [ σ i ]. For example, a possible world D for the
database of Table 1 can be generated by assigning event e 1 to Z with probability
0 . 3, events e 2 and e 4 to X with probabilities 0 . 7and0 . 6 respectively, and event
e 3 to Y with probability 0 . 3, and Pr[ D ]=0 . 3
×
0 . 7
×
0 . 3
×
0 . 6=0 . 0378.
Table 1. An SLU event database (L) transformed to p-sequences (R). Note that the
events like e 1 (marked with on (R)) can only be associated with one of the sources
X, Y and Z in any possible world
eid event
W
p-sequence
D X ( a, c, e :0 . 1) ( b, c, d :0 . 7)( a, d, e :0 . 2)
( b, c, e :0 . 6)
D Y ( a, c, e :0 . 6) ( b, c, d :0 . 3)( a, d, e :0 . 3)
D Z ( a, c, e :0 . 3) ( a, d, e :0 . 5)( b, c, e :0 . 4)
e 1
( a, c, e ) ( X :0 . 1)( Y :0 . 6)( Z :0 . 3)
e 2
( b, c, d )
( X :0 . 7)( Y :0 . 3)
e 3
( a, d, e ) ( X :0 . 2)( Y :0 . 3)( Z :0 . 5)
e 4
( b, c, e )
( X :0 . 6)( Z :0 . 4)
As a possible world is a deterministic instance of a given probabilistic database,
concepts like the support of a sequence in a possible world do apply. The expected
support of a sequence s in D p is computed as follows:
ES ( s, D p )=
Pr[ D ]
Sup ( s, D ) ,
(1)
D ∈PW ( D p )
The problem we consider is:
Given an SLU probabilistic database D p , determine all sequences s such
that ES ( s, D p )
≥ θm , for some user-specified threshold θ ,0 <θ≤
1.
 
Search WWH ::




Custom Search