Data exchange by example - Foundations of Data Exchange

Database Reference

In-Depth Information

ated: for example, many problems related to reasoning about dependencies are complete

for complexity classes such as NP, or CO NP, or P SPACE .

2. Dealingwith data. These are the key problems such as querying or updating the data.

Of course, given the typically large size of databases, only low-complexity algorithms

are tolerated when one handles data. For example, the complexity of evaluating a fixed

relational algebra query is very low (AC 0 , to be precise), and even more expressive

languages such as Datalog stay in P TIME .

In data exchange, the key tasks too can be split into two groups. For static analysis tasks,

we treat schema mappings as first-class citizens. The questions one deals with are generally

of two kinds:

• Consistency . For these questions, the input is a schema mapping

, and the question is

whether it makes sense: for example, whether there exists a source S that has a solution

under

M

, or whether all sources of a given schema have solutions. These analyses are

important for ruling out “bad” mappings that are unlikely to be useful in data exchange.

M

•

Operations on mappings . Suppose we have a mapping

M

from a source schema R s to a

M that uses R t as the source schema and maps

it into a schema R u . Can we combine these mappings into one, the composition of the

two,

target schema R t , and another mapping

M ◦M , which maps R s to R u ? Or can we invert a mapping, and find a mapping

M − 1

and recovers

as much original information about the source as possible? These questions arise when

one considers schema evolution: as schemas evolve, so do the mappings between them.

And once we understand when and how we can construct mappings such as

from R t into R s , that undoes the transformation performed by

M

M ◦M or

M − 1 , we need to understand their properties with respect to the “existence of solutions”

problem.

Tasks involving data are generally of two kinds.

•

Materializing target instances . Suppose we have a schema mapping

and a source

instance S . Which target instance do we materialize? As we already saw, there could be

many - perhaps infinitely many - target instances which are solutions for S under

M

.

Choosing one we should think of three criteria:

1. it should faithfully represent the information from the source, under the constraints

imposed by the mapping;

2. it should not contain (too much) redundant information;

3. the computational cost of constructing the solution should be reasonable.

•

Query answering . Ultimately, we want to answer queries against the target schema. As

we explained, due the existence of multiple solutions, we need to answer them in a way

that is consistent with the source data. So if we have a materialized target T and a query

Q , we need to find a way of evaluating it to produce the set of certain answers. As we

shall see, sometimes computing Q ( T ) does not give us certain answers, so we may need

to change Q into another query Q and then evaluate Q on a chosen solution to get the

Foundations of Data Exchange

Search WWH ::

Custom Search

Home