Database Reference
In-Depth Information
1.
h
(
x
)=
x
for every
x
that occurs in
x
;and
2. if
U
(
t
) is a fact in
T
,then
U
(
h
(
t
)) is a fact in
T
.
If
x
is clear from the context, we say that
h
is simply a homomorphism from
T
to
T
.
Consider an example of two conjunctive queries:
z
U
1
(
x
,
z
)
U
2
(
y
,
z
,
x
)
Q
1
(
x
,
y
)=
∃
∧
Q
2
(
x
,
y
)=
∃
z
∃
w
∃
r
U
1
(
x
,
z
)
∧
U
1
(
v
,
w
)
∧
U
2
(
y
,
z
,
x
)
∧
U
2
(
r
,
w
,
v
)
.
The tableau for the first contains the facts
U
1
(
x
,
z
) and
U
2
(
y
,
z
,
x
); the tableau for the second
in addition contains the facts
U
1
(
v
,
w
) and
U
2
(
r
,
w
,
v
). One can then see that the map given
by:
h
(
v
)=
x
h
(
w
)=
z
h
(
r
)=
y
h
(
x
)=
x
h
(
y
)=
y
h
(
z
)=
z
is a homomorphism from
T
Q
2
to
T
Q
1
, as it preserves the free variables
x
and
y
, and sends
U
1
(
v
,
w
) into
U
1
(
x
,
z
) and
U
2
(
r
,
w
,
v
) into
U
2
(
y
,
z
,
x
).
The importance of homomorphisms comes from the fact that they allow one to test
conjunctive queries for containment. Namely, if we have two conjunctive queries
Q
(
x
) and
Q
(
x
) with tableaux (
T
Q
,
x
) and (
T
Q
,
x
),then
Q
⊆
Q
⇔
there is a homomorphism
h
:
T
Q
→
T
Q
.
Note that the direction of the homomorphism is opposite to the direction of containment.
For the above examples of queries
Q
1
and
Q
2
, we have both a homomorphism
T
Q
2
→
T
Q
1
and a homomorphism
T
Q
1
→
T
Q
2
(the identity), which implies
Q
1
=
Q
2
.
2.3 Incomplete data
We have seen in earlier examples that target instances may contain
incomplete informa-
tion
. Typically, incomplete information in databases is modeled by having two disjoint and
infinite sets of values that populate instances:
•
the set of
constants
, denoted by C
ONST
,and
•
the set of
nulls
, or variables, denoted by V
AR
.
In general the domain of an instance is a subset of C
ONST
V
AR
(although we shall as-
sume that domains of source instances are always contained in C
ONST
). We usually denote
constants by lowercase letters
a
,
b
,
c
,...
(or mention specific constants such as numbers or
strings), while nulls are denoted by symbols
∪
⊥
,
⊥
1
,
⊥
2
,etc.
Incomplete relational instances over C
ONST
∪
V
AR
are usually called
naıve
databases.
Note that a null
⊥∈
V
AR
appears at most once, we speak of
Codd
databases. If we talk about single relations,
⊥∈
V
AR
can appear multiple times in such an instance. If each null