Database Reference
In-Depth Information
Proof
The
CO
NP upper bounds follow from
Theorem 13.3
. Below we prove the lower
bound for the first claim. The proof for the second claim can be obtained by replacing
with
+
. The third claim is left for the reader (see
Exercise 16.3
).
The reduction is very similar to the one used in the proof of
Proposition 13.5
, only the
selection of literals for each clause is done by permuting the data values in the
L
i
-children:
we choose the literal encoded by the data value stored in
L
1
.
Thus, the source DTD
D
s
remains the same, the target DTD
D
t
is equal to
D
s
, the st-tgds
Σ
st
are
↓
r
/
C
[
L
1
(
x
)
,
L
2
(
y
)
,
L
3
(
z
)]
→
r
/
C
[ (
x
)
,
(
y
)
,
(
z
)]
,
r
/
V
(
x
,
y
)
→
r
/
V
(
x
,
y
)
,
yr
V
(
x
,
y
)
,
C
/
L
1
(
x
)
,
C
/
L
1
(
y
)
.
and the query is
∃
x
∃
Hardness due to queries
Let us now move to the analysis of the query language. One lesson learned from the re-
lational case is that inequality in the query language leads to
CO
NP-hardness. Since the
usual translation from the relational setting to the XML setting produces mappings from
SM
nr
(
,
=), we immediately get
CO
NP-hardness even for the simplest mappings (we have
already seen that for more expressive mappings the problem may be undecidable).
↓
SM
nr
(
Corollary 13.7
There exist a mapping
M ∈
↓
,
=)
and a query Q
∈
CTQ(
↓
,
=
,
=)
such that
CERTAIN
M
(
Q
)
is
CO
NP
-complete.
Similarly, allowing any form of horizontal navigation in queries leads to intractability even
for the simplest mappings.
SM
nr
(
Proposition 13.8
There exist a schema mapping
M ∈
↓
)
, and queries Q
1
∈
CTQ(
↓
+
,
=)
,
,
→
,
=)
and Q
2
∈
CTQ(
↓
,
→
such that both
CERTAIN
M
(
Q
1
)
and
CERTAIN
M
(
Q
2
)
are
CO
NP
-complete.
Proof
The proof is almost identical as for
Proposition 13.6
. The mapping uses the same
D
s
.ThetargetDTD
D
t
is
C
∗
V
∗
r
→
V
:@
a
1
,
@
a
2
L
∗
C
→
L
:@
b
and the st-tgds
Σ
st
are
r
/
C
[
L
1
(
x
)
,
L
2
(
y
)
,
L
3
(
z
)]
→
r
/
C
[
L
(
x
)
,
L
(
y
)
,
L
(
z
)]
,
r
/
V
(
x
,
y
)
→
r
/
V
(
x
,
y
)
.
Intuitively, we choose the literal having more than two following siblings. Since each
C
node has at least three
L
children, clearly at least one literal is chosen for each clause. The
query
Q
1
is just
yr
L
(
x
,
y
)
,
C
[
L
(
x
)
L
]
. Replacing
∃
x
∃
→
L
→
L
]
,
C
[
L
(
y
)
→
L
→
→
with
+
gives
Q
2
.
→