Database Reference

In-Depth Information

Data and combined complexity

When we deal with database-motivated questions, it is important to specify precisely what

is viewed as an input to a problem. Consider, for now, query evaluation. A typical formu-

lation of this problem is as follows:

P
ROBLEM
: Q
UERY EVALUATION

I
NPUT
: A query
Q
, a database
D
, a tuple
t
.

Q
UESTION
: D s
t
belong to
Q
(
D
)?

We refer to the complexity of this problem as the
combined complexity
of query evaluation,

as both the query
Q
,andthedata-
D
and
t
- are included in the input.

In the database setting, however, queries are typically very small compared to data.

Hence, it makes sense to measure the complexity in terms of the data: this is what we

often are interested in. This is captured by the following modification of the query evalua-

tion problem:

P
ROBLEM
: Q
UERY EVALUATION
(
Q
)

I
NPUT
: A database
D
, a tuple
t
.

Q
UESTION
: D s
t
belong to
Q
(
D
)?

We refer to the complexity of this problem as the
data complexity
of query evaluation,

since the input consists only of data.

The two complexities can be dramatically different. For example:

•

For relational calculus (i.e., FO), the combined complexity is P
SPACE
-complete, while

the data complexity is in AC
0
.

•

For conjunctive queries (and their unions), the combined complexity is NP-complete,

while the data complexity is in AC
0
.

In fact, it is very common to have an exponential gap between data and combined com-

plexities. To give an intuitive explanation of where it comes from, one can see that a query

evaluation algorithm runs, very roughly, in time

O
(
Q
)
,where

refers to the size of

Q
. Thus, for a fixed
Q
, it is polynomial, but when
Q
is a part of the input, it takes exponen-

tial time. A careful implementation of this naıve algorithm results in P
SPACE
complexity.

For conjunctive queries of the form

D

Q

, one needs to guess witnesses for the existential

quantifiers, and then check if they make the formula

∃

x

α

true, which gives an NP algorithm.

When we consider problems related to data exchange, the input generally will be split

into two parts: the data part, involving instances (source and target), and the nondata part,

involving schema mappings and queries. If only data is viewed as the input, we shall be

talking about data complexity; if mappings and queries are part of the input as well, we

shall be talking about combined complexity.

α