Database Reference
In-Depth Information
This means that if the certain answer is to be processed further within the same frame-
work, based on tree patterns and
TQL
queries, our only concern should be its size.
What is then the complexity of finding a max-description, and how big can it be? In
particular, how big can the smallest of them, i.e., the core-description, be?
Let
|
T
|
be the number of nodes in
T
,andlet
T
be
∑
T
∈T
|
T
|
. We show next that we
can compute a max-description of
T
in E
XPTIME
in general, and in P
TIME
if the number
of trees in
is fixed.
Exercise 16.3
shows that even core descriptions can be exponential
in the number of trees in
T
.
T
Theorem 14.11
Let
T
=
{
T
1
,...,
T
n
}
be a finite set of XML trees. A max-description of
is computable in time polynomial in
T
n
n
T
.
Proof
For simplicity assume that every node has a single attribute; extension to the gen-
eral case is straightforward.
The pattern we are going to construct is a sort of consistent product of
T
1
,...,
T
n
.Pro-
ceed as follows. For the root, labeled with
r
, take the sequence (
),i.e.,these-
quence of roots of
T
1
,...,
T
n
. Then iteratively, under every node (
v
1
,...,
v
n
) put a new node
(
w
1
,...,
w
n
) for every sequence
w
1
,...,
w
n
such that
w
i
is a child of
v
i
in
T
i
,andall
w
i
's are
labeled with the same letter
ε
,...,
ε
and put a fresh variable in the attribute
slot. If for some node
v
=(
v
1
,...,
v
n
) some
v
i
is a leaf,
v
is a leaf as well.
Define
A
(
v
1
,...,
v
n
)=(
a
1
,...,
a
n
) where
a
i
is the data value attached to the node
v
i
.For
every node
v
such that
A
(
v
)=(
c
,...,
c
) for a constant
c
, replace the variable in
v
with the
constant
c
. For the remaining nodes, whenever
A
(
v
)=
A
(
w
), replace the variable in
w
with
the variable in
v
.
The constructed formula is clearly satisfied in every
T
i
. Let us see that it is indeed a
max-description. Suppose that
σ
. Label the node
σ
π
is satisfied in every
T
i
.Let
h
i
be a homomorphism from
π
to
T
i
. It is easy to see that
h
=(
h
1
,...,
h
n
) is a homomorphism from
π
to
.
The complexity of the algorithm is polynomial in the size of the output pattern, which
is bounded by
π
T
n
n
n
i
=1
(by the inequality between the arithmetic and the geo-
metric means for nonnegative numbers).
|
T
i
|≤
∏
14.3 Certain answers for
TQL
queries
When we try to compute certain answers to a query
Q
over a set
is usually
infinite. Hence, we need a finite representation of it. Recall also that finiteness guarantees
existence of max-descriptions. Such a finite representation comes in the form of a
basis
T
of trees,
T
B
of
, which is a set of patterns. Recall that each pattern can be viewed as a tree, so when
we write
T
B ⊆T
, we mean that for every
π
∈B
, the tree associated to
π
is in
T
.
Definition 14.12 A
basis
for a set of trees
T
is a set of patterns
B ⊆T
such that each
pattern in
T
is a model of some pattern from
B
:
T ⊆
{
Mod(
π
)
|
π
∈B}
.