Database Reference
In-Depth Information
R
FID
SSN
N
P
351
185
Smith
0.2
351
785
Smith
0.8
352
185
Brown
0.3
352
186
Brown
0.7
Figure 2.7:
A BID table. There are two blocks (defined by the FID attribute): tuples in each block are
disjoint, while tuples across blocks are independent. Note that in every possible world the attribute FID
is a key (see
Figure 2.3
); this justifies underlining it.
not have negation. Next, one can check that, for any monotone query
Q
,if
W
is an incomplete
database with a maximal element, then
Q(
W
)
is also an incomplete database with a maximal
element. Indeed, apply
Q
to the maximal world in
W
: the result is a maximal world in
Q(
W
)
,
by the monotonicity of
Q
. Therefore, tuple-independent tables extended with UCQ views can only
represent probabilistic databases that have a maximal possible world. Since there exists probabilistic
databases without a maximal element (for example, consider the c-table in
Figure 2.4
(b) and extend
it with an arbitrary probability distribution on the assignments of
X, Y
), the claim of the proposition
follows.
2.7.2 BID DATABASES
A
block-independent-disjoint database
, or BID database, is a probabilistic database where the set of
possible tuples can be partitioned into blocks, such every block is included in a single relation
5
, and
the following property holds: all tuples in a block are
disjoint
probabilistic events, and all tuples from
different blocks are
independent
probabilistic events. If the database consists of a single table, then
we call it a BID-table. For a simple example, every tuple-independent table is, in particular, a BID
table where each tuple is a block by itself.
Every BID table can be represented by a simple pc-table where all possible tuples in the same
block,
t
1
,t
2
,...
are annotated with atomic events
X
=
1,
X
=
2,
...
, where
X
is a unique variable
used only for that block; this is shown, for example, in
Figure 2.2
. Furthermore, set the probabilities
as
p
i
=
P(X
=
i)
=
P(t
i
∈
W)
.
In practice, we use a simpler representation of a BID table
R
, as follows. We choose a set of
attributes
A
1
,A
2
,...
of
R
that uniquely identify the block to which the tuple belongs: these will
be called
key attributes
because they form, indeed, a key in every possible world. Then, we add a
probability attribute
P
. Thus, the schema of a BID table is
R(
A
1
,...,A
k
,B
1
,...,B
m
,P)
. For an
illustration, consider representing the BID table for
Figure 2.2
: this is shown in
Figure 2.7
. The
attribute FID uniquely identifies a block (this is why it is underlined); tuples within a block are
disjoint, and their probabilities add up to 1
.
0, while tuples across blocks are independent. To help
visualize the blocks, we separate them by horizontal lines.
5
That is, a block cannot contain two tuples from two distinct relations
R
and
S
.