Database Reference
In-Depth Information
inclusion-exclusion formula: all three queries
Q
,
Q
, and
Q
∨
Q
are inversion free, and the
claim follows by induction. If, on the other hand,
Q
=
Q
1
∨
Q
2
∨
...
and each
Q
i
is connected,
then it must have a root variable
x
i
. Write
Q
...)
: clearly
z
is a root
variable, and it is also a separator variable because for any two unifiable atoms
L
1
,L
2
,
z
occurs on
the same position
π
L
1
(
1
)
=∃
z.(Q
1
[
z/x
1
]∨
Q
2
[
z/x
2
]∨
π
L
2
(
1
)
. Thus, inversion free queries are
R
6
-safe queries and, therefore,
in
UCQ(P )
. The following proposition strengthen this observation:
=
Proposition 5.19
The following holds: UCQ(OBDD)
=
UCQ(OBDD,1)
=
“inversion free queries”.
The proposition says two things. On one hand, every inversion-free query admits an OBDD
whose size is linear in the size of the database: in fact, we will show that its width is 2
k
=
O(
1
)
,
where
k
is the total number of atoms in the hierarchical, inversion free expression of
Q
. Thus, the
width of the OBDD depends only on the query, not on the database instance, and therefore the size
of the OBDD is linear,
O(n)
, meaning that
Q
∈
UCQ(OBDD,1)
. On the other hand, if a query
is not inversion-free, then the size of the smallest OBDD grows exponentially in the size of the
database. For example, the proposition implies that the lineage of
Q
J
(
Example 4.7
) has an OBDD
whose size is linear in that of the input database, while the lineage of
Q
V
(
Example 4.8
) does not
have polynomial-size OBDDs.
We give here the main intuition behind the positive result, namely that an inversion free
query has an OBDD of width 2
k
. We illustrate with the example
Q
J
: the reader can derive the
general case from this example. Start by writing
Q
J
=
Q
1
∧
Q
2
, where
Q
1
=
R(x
1
), S(x
1
,y
1
)
and
Q
2
=
T(x
2
), S(x
2
,y
2
)
. Fix a database instance
D
. Then, the lineage
Q
1
is read-once, and
therefore it admits an OBDD of width 1. For example, on a database with with four tuples,
R(
1
), R(
2
)
and
S(
1
,
1
), S(
1
,
2
), S(
2
,
3
), S(
2
,
4
)
, its lineage is
X
1
Y
1
∨
X
2
Y
4
, and
its OBDD is shown in
Figure 5.2
. In general, the OBDD examines the tuples
S(i,j)
in row-
major order; that is, for some arbitrary data instance
D
, the variable order of the OBDD is
R(
1
), S(
1
,
1
), S(
1
,
2
),...,R(
2
), S(
2
,
1
), S(
2
,
2
),...
Next, we transform the OBDD such that
every path examines all the variables. For that, we must insert on the 0-edge from
R(
1
)
dummy nodes
S(
1
,
1
), S(
1
,
2
),...
, and we must insert in the 0-edge from
R(
2
)
the dummy
nodes
S(
2
,
1
), S(
2
,
2
),...
Thus, we have a “complete” OBDD for
Q
1
of width 2. Similarly,
we obtain a complete OBDD for
Q
2
of width 2, which reads the Boolean variables in the
same
order:
T(
1
), S(
1
,
1
), S(
1
,
2
),...,T(
2
), S(
2
,
1
), S(
2
,
2
),...
We insert the missing variables
T(
1
), T (
2
),...
in the first OBDD, and the missing variables
R(
1
), R(
2
),...
in the second OBDD,
without increasing the width. Now, we can synthesize an OBDD for
Q
=
Q
1
∧
Q
2
of width
4, by using the property on OBDD synthesis mentioned above (see also [
Wegener
,
2004
]). Thus,
Q
J
admits an OBDD of size 4
n
, where
n
is the total number of tuples in the database.
In general, we can synthesize the OBDD inductively on the structure of the query
Q
, provided
that the subqueries
Q
1
,Q
2
used
the same
variable order: this is possible for inversion-free queries be-
cause the variable order of the tuples in a relation
R(x
1
,x
2
,...)
is the lexicographic order, determined
by the attribute-order
π
R
(
1
), π
R
(
2
),...
If the query has an inversion, then the synthesis is no longer
X
1
Y
2
∨
X
2
Y
3
∨