Database Reference
In-Depth Information
Observe that it is not feasible to use Eq. 1 directly due to the exponential number
of possible worlds. Consider for example, we want to compute the probability
with which source
X
supports a sequence
s
=(
a
)(
b
) in the sample database
of Table 1. There can be two ways in which
s
can be supported by source
X
i.e. either
e
1
does support (
a
) and at least one of the events
e
2
or
e
4
support
(
b
) or alternatively,
e
1
does not support (
a
) but
e
3
does support (
a
), and
e
4
supports (
b
). The probability that
X
supports
s
is calculated as 0
.
196. Clearly,
computing the source support probability this way is an expensive operation. In
[12], a dynamic programming (DP) based algorithm was proposed to compute
the source support probability Pr[
s D
i
], and it was shown that the
ES
of a
sequence could be computed as follows:
ES
(
s, D
p
)=
m
Pr[
s D
i
]
(2)
i
=1
3 Pattern-Growth Approach
We now describe our Pattern-Growth-Approach (PGA) that uses in essence
the already proposed sub-routines, Dynamic Programming (DP) and fast
L
1
computation [12]; and integrates it with the pattern-growth mechanism [13]. We
first review some concepts:
p-Sequences.
A
p-sequence
is analogous to a source sequence in classical SPM,
and is a sequence of the form
(
e
1
,c
1
)
...
(
e
k
,c
k
)
,where
e
j
is an event and
c
j
is a confidence value. An SLU database
D
p
can be viewed as a collection of
p-sequences
D
1
,...,D
p
m
,where
D
i
is the p-sequence of source
i
, and contains a
list of those events in
D
p
that have non-zero confidence of being associated with
source
i
, ordered by
eid
, together with the associated confidence (see Table 1(R)).
Further, given a sequence
s
=
and an item
x
,
s
can either be extended
by adding
x
as a separate element in
s
,
s·{x}
s
1
,...,s
q
(S-extension) or by appending
x
to
the last element in
s
,
(I-extension). For example, for
s
=(
a
)(
b
)
and
x
=
c
, S- and I-extensions of
s
are (
a
)(
b
)(
c
)and(
a
)(
b, c
) respectively.
Dynamic Programming.
Let
i
be a source,
D
i
s
1
,...,s
q
∪{x}
=
(
e
1
,c
1
)
,...,
(
e
r
,c
r
)
,and
s
=
be any sequence. Now let
A
i,s
be the (
q × r
)DPmatrixusedto
compute Pr[
s D
i
s
1
,...,s
q
], and let
B
i,s
denote the last row of
A
i,s
,thatis,
B
i,s
[
]=
A
i,s
[
q,
]for
=1
,...,r
.For1
≤ k ≤ q
and 1
≤ ≤ r
,
A
[
k,
] will contain
], so
A
[
q, r
]isthevaluePr[
s D
i
Pr[
s
1
,...,s
k
(
e
1
,c
1
)
,...,
(
e
,c
)
]. We
set
A
[1
,
] = 1 for all
,1
≤ ≤ r
and
A
[
k,
1] = 0 for all 1
≤ k ≤ q
, and compute
the other values row-by-row. For 1
≤ k ≤ q
and 1
≤ ≤ r
, define:
=
c
if
s
k
⊆ e
0otherwise
,
c
k
(3)
where
c
k
is the probability that the element
s
k
is contained in the event
e
in
source
i
;If
s
k
⊆ e
,
c
k
is equal to the probability that
e
is associated with
source
i
, and 0 otherwise. Next, use the following recurrence: