Database Reference
InDepth Information
language of ground conjunctive queries
: that is, conjunctions of atoms
R
(
a
),where
R
is a
relation, and
a
is a tuple of constants. Then
Q
(
D
) represents the maximum knowledge
we can extract from
Q
(
D
) with respect to this language. This is the familiar notion of
certain answers.
However, if we use the more expressive
language of conjunctive queries
, we can extract
additional knowledge from
Q
(
D
). For instance, suppose
Q
(
D
) consists of two instances:
{
R
(
a
,
a
)
,
R
(
b
,
c
)
}
and
{
R
(
a
,
a
)
,
R
(
d
,
c
)
}
.
Then the conjunctive query
R
(
a
,
a
)
) than the inter
section of the instances above, which is just
R
(
a
,
a
). This certain knowledge about
Q
(
∧∃
xR
(
x
,
c
) tells us more about
Q
(
D
D
)
would be traditionally presented as a naıve table: that is, a table with variables (nulls). The
naıve table corresponding to
R
(
a
,
a
)
∧∃
xR
(
x
,
c
) is simply
{
R
(
a
,
a
)
,
R
(
x
,
c
)
}
. Notice that the
class of databases that satisfy
R
(
a
,
a
)
): every valuation of
x
in
{
R
(
a
,
a
)
,
R
(
x
,
c
)
}
gives such a database. Nevertheless, conjunctive queries are not able
to extract any more knowledge about
Q
(
∧∃
xR
(
x
,
c
) is not equal to
Q
(
D
D
), and thus narrow down the set of described
databases. The formula
R
(
a
,
a
)
∧∃
xR
(
x
,
c
) is as close to a definition of
Q
(
D
) as possible
within the language of conjunctive queries. Let us formalize these intuitions.
We define the notion of a
maxdescription
of a set
of databases that, in a given lan
guage, expresses the information we can infer with certainty from
D
D
. Certain answers to
queries are then a special case of maxdescriptions, applied to sets
{
Q
(
D
)
}
as
D
ranges
over a collection of databases.
To explain this notion, assume that
L
is a logical formalism in which we express prop
erties of databases from
D
(e.g., conjunctive queries, or ground conjunctive queries such
as
R
(
a
,
a
)). A set
Φ
of formulae of
L
defines a set of databases, called models of
Φ
and
denoted Mod(
Φ
), consisting of all databases satisfying each formula in
Φ
:
Mod(
Φ
)=
{
D

D

=
ϕ
for every
ϕ
∈
Φ
}
.
To describe
D
fully in
L
we would need its
L
definition: a finite set
Φ
of
L
formulae
such that
). This is not always achievable, so instead we settle for the next
best thing, which is an
D
= Mod(
Φ
L
definition of the set of models of
certain knowledge
about
D
expressed in
L
.
This certain
L
knowledge of the class
D
, called
L
theory of
D
, is the set of all formu
lae from
L
satisfied in all databases from
D
:
Th
L
(
D
)=
{
ϕ
∈L 
D

=
ϕ
for every
D
∈D}
.
The most precise (finite) description of
D
we can express in
L
is a finite definition of
Mod(Th
L
(
D
)), i.e., a finite set
Φ
of formulae of
L
such that
Mod(
Φ
)=Mod(Th
L
(
D
))
.
of XML trees. Since our setting always
includes a schema, we will be making the following assumptions about such sets
Let us now apply this general definition to sets
T
T
: