Database Reference
In-Depth Information
Observe that it is not feasible to use Eq. 1 directly due to the exponential number
of possible worlds. Consider for example, we want to compute the probability
with which source X supports a sequence s =( a )( b ) in the sample database
of Table 1. There can be two ways in which s can be supported by source X
i.e. either e 1 does support ( a ) and at least one of the events e 2 or e 4 support
( b ) or alternatively, e 1 does not support ( a ) but e 3 does support ( a ), and e 4
supports ( b ). The probability that X supports s is calculated as 0 . 196. Clearly,
computing the source support probability this way is an expensive operation. In
[12], a dynamic programming (DP) based algorithm was proposed to compute
the source support probability Pr[ s D i
], and it was shown that the ES of a
sequence could be computed as follows:
ES ( s, D p )= m
Pr[ s D i
]
(2)
i
=1
3 Pattern-Growth Approach
We now describe our Pattern-Growth-Approach (PGA) that uses in essence
the already proposed sub-routines, Dynamic Programming (DP) and fast L 1
computation [12]; and integrates it with the pattern-growth mechanism [13]. We
first review some concepts:
p-Sequences. A p-sequence is analogous to a source sequence in classical SPM,
and is a sequence of the form
( e 1 ,c 1 ) ... ( e k ,c k )
,where e j
is an event and c j
is a confidence value. An SLU database D p
can be viewed as a collection of
p-sequences D 1 ,...,D p m
,where D i
is the p-sequence of source i , and contains a
list of those events in D p that have non-zero confidence of being associated with
source i , ordered by eid , together with the associated confidence (see Table 1(R)).
Further, given a sequence s =
and an item x , s can either be extended
by adding x as a separate element in s , s·{x}
s 1 ,...,s q
(S-extension) or by appending x to
the last element in s ,
(I-extension). For example, for s =( a )( b )
and x = c , S- and I-extensions of s are ( a )( b )( c )and( a )( b, c ) respectively.
Dynamic Programming. Let i be a source, D i
s 1 ,...,s q ∪{x}
=
( e 1 ,c 1 ) ,..., ( e r ,c r )
,and
s =
be any sequence. Now let A i,s be the ( q × r )DPmatrixusedto
compute Pr[ s D i
s 1 ,...,s q
], and let B i,s
denote the last row of A i,s ,thatis, B i,s [ ]=
A i,s [ q, ]for =1 ,...,r .For1
≤ k ≤ q and 1
≤ ≤ r , A [ k, ] will contain
], so A [ q, r ]isthevaluePr[ s D i
Pr[
s 1 ,...,s k
( e 1 ,c 1 ) ,..., ( e ,c )
]. We
set A [1 , ] = 1 for all ,1
≤ ≤ r and A [ k, 1] = 0 for all 1
≤ k ≤ q , and compute
the other values row-by-row. For 1
≤ k ≤ q and 1
≤ ≤ r , define:
= c if s k ⊆ e
0otherwise ,
c k
(3)
where c k
is the probability that the element s k
is contained in the event e
in
source i ;If s k ⊆ e , c k
is equal to the probability that e
is associated with
source i , and 0 otherwise. Next, use the following recurrence:
 
Search WWH ::




Custom Search