ated: for example, many problems related to reasoning about dependencies are complete
for complexity classes such as NP, or CO NP, or P SPACE .
2. Dealingwith data. These are the key problems such as querying or updating the data.
Of course, given the typically large size of databases, only low-complexity algorithms
are tolerated when one handles data. For example, the complexity of evaluating a fixed
relational algebra query is very low (AC 0 , to be precise), and even more expressive
languages such as Datalog stay in P TIME .
In data exchange, the key tasks too can be split into two groups. For static analysis tasks,
we treat schema mappings as first-class citizens. The questions one deals with are generally
of two kinds:
• Consistency . For these questions, the input is a schema mapping
, and the question is
whether it makes sense: for example, whether there exists a source S that has a solution
, or whether all sources of a given schema have solutions. These analyses are
important for ruling out “bad” mappings that are unlikely to be useful in data exchange.
Operations on mappings . Suppose we have a mapping
from a source schema R s to a
M that uses R t as the source schema and maps
it into a schema R u . Can we combine these mappings into one, the composition of the
target schema R t , and another mapping
M ◦M , which maps R s to R u ? Or can we invert a mapping, and find a mapping
M − 1
as much original information about the source as possible? These questions arise when
one considers schema evolution: as schemas evolve, so do the mappings between them.
And once we understand when and how we can construct mappings such as
from R t into R s , that undoes the transformation performed by
M ◦M or
M − 1 , we need to understand their properties with respect to the “existence of solutions”
Tasks involving data are generally of two kinds.
Materializing target instances . Suppose we have a schema mapping
and a source
instance S . Which target instance do we materialize? As we already saw, there could be
many - perhaps infinitely many - target instances which are solutions for S under
Choosing one we should think of three criteria:
1. it should faithfully represent the information from the source, under the constraints
imposed by the mapping;
2. it should not contain (too much) redundant information;
3. the computational cost of constructing the solution should be reasonable.
Query answering . Ultimately, we want to answer queries against the target schema. As
we explained, due the existence of multiple solutions, we need to answer them in a way
that is consistent with the source data. So if we have a materialized target T and a query
Q , we need to find a way of evaluating it to produce the set of certain answers. As we
shall see, sometimes computing Q ( T ) does not give us certain answers, so we may need
to change Q into another query Q and then evaluate Q on a chosen solution to get the