Database Reference
In-Depth Information
such that
s
k
⊆ t
i
j
,for
k
=1
,...,q
.The
source sequence
D
i
corresponding to a
source
i
is just the multiset
, ordered by
eid
. For any sequence
s
, define its
support
in
D
, denoted
Sup
(
s, D
) is the number of sources
i
such
that
s D
i
. The objective is to find all sequences
s
such that
Sup
(
s, D
)
{e|
(
eid, e, i
)
∈ D}
≥ θm
for some user-defined threshold 0
<θ≤
1.
Probabilistic Databases.
We define an SLU
probabilistic database
D
p
to be an
ordered list
of records of the form (
eid
,e,
W
)where
eid
is an event-id,
e
is an event and
W
is a probability distribution over
r
1
,...,r
n
; the list is ordered by
eid
.
The distribution
W
contains pairs of the form (
σ, c
), where
σ ∈S
S
and 0
<c≤
1
is the confidence that the event
e
is associated with source
σ
and
(
σ,c
)
∈W
c
=1.
An example can be found in Table 1(L). The
possible worlds
semantics of
D
p
is as follows. A
possible world
D
∗
of
D
p
is generated by taking each event
e
i
in
turn, and assigning it to one of the possible sources
σ
i
∈ W
i
. Thus every record
r
i
=(
eid
i
,e
i
,W
i
)
in
D
∗
.
The complete set of possible worlds is obtained by enumerating all such possible
combinations. We assume that the distributions
W
i
associated with each record
r
i
∈ D
p
takes the form
r
i
=(
eid
i
,e
i
,σ
i
), for some
σ
i
∈S
in
D
p
are stochastically independent; the probability of a possible world
D
∗
is therefore Pr[
D
∗
]=
i
=1
Pr
W
i
[
σ
i
]. For example, a possible world
D
∗
for the
database of Table 1 can be generated by assigning event
e
1
to
Z
with probability
0
.
3, events
e
2
and
e
4
to
X
with probabilities 0
.
7and0
.
6 respectively, and event
e
3
to
Y
with probability 0
.
3, and Pr[
D
∗
]=0
.
3
×
0
.
7
×
0
.
3
×
0
.
6=0
.
0378.
Table 1.
An SLU event database (L) transformed to p-sequences (R). Note that the
events like
e
1
(marked with
†
on (R)) can only be associated with one of the sources
X, Y
and
Z
in any possible world
eid event
W
p-sequence
D
X
(
a, c, e
:0
.
1)
†
(
b, c, d
:0
.
7)(
a, d, e
:0
.
2)
(
b, c, e
:0
.
6)
D
Y
(
a, c, e
:0
.
6)
†
(
b, c, d
:0
.
3)(
a, d, e
:0
.
3)
D
Z
(
a, c, e
:0
.
3)
†
(
a, d, e
:0
.
5)(
b, c, e
:0
.
4)
e
1
(
a, c, e
) (
X
:0
.
1)(
Y
:0
.
6)(
Z
:0
.
3)
e
2
(
b, c, d
)
(
X
:0
.
7)(
Y
:0
.
3)
e
3
(
a, d, e
) (
X
:0
.
2)(
Y
:0
.
3)(
Z
:0
.
5)
e
4
(
b, c, e
)
(
X
:0
.
6)(
Z
:0
.
4)
As a possible world is a deterministic instance of a given probabilistic database,
concepts like the support of a sequence in a possible world do apply. The
expected
support
of a sequence
s
in
D
p
is computed as follows:
ES
(
s, D
p
)=
Pr[
D
∗
]
Sup
(
s, D
∗
)
,
∗
(1)
D
∗
∈PW
(
D
p
)
The problem we consider is:
Given an SLU probabilistic database
D
p
, determine all sequences
s
such
that
ES
(
s, D
p
)
≥ θm
, for some user-specified threshold
θ
,0
<θ≤
1.