Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

introduced universal solutions for a data exchange scenario. Sufficient conditions

for the existence of a universal solution have been defined for weakly acyclic

tgds [ Fagin et al. 2005a ]. In this special case, polynomial-time algorithms can be

defined to determine whether a solution exists and to produce a particular solution,

the canonical universal solution (as defined in Sect. 2 ). By analyzing query answer-

ing issues in greater detail, Fagin et al. [ 2005a ] focuses on determining which target

queries can be answered using solely the materialized target instance, and studies

the computational complexity of computing certain answers for target queries.

Given a fixed data exchange scenario

. S ; T ;˙ st ;˙ t / , for each target query

q , the problem is to study the computational complexity of the following problem:

given a source instance I , find the certain answers of q with respect to I .If q

is a union of conjunctive queries, the certain answers of q can be computed on

an arbitrary canonical universal solution. Having this solution homomorphisms to

all solutions, and being computable in polynomial time, it implies that the certain

answers of q as union of conjunctive queries can also be computed in polyno-

mial time. However, if conjunctive queries have inequalities, computing the certain

answers becomes a coNP-complete problem [ Abiteboul and Duschka 1998 ]. In par-

ticular, in Fagin et al. [ 2005a ], it is shown that computing the certain answers of

unions of conjunctive queries with at most two equalities per disjunct is a coNP-

complete problem. Beyond the intractability result for the case with two or more

inequalities, Fagin et al. [ 2005a ] show that there is a polynomial time algorithm for

computing certain answers of queries with at most one inequality per disjunct (thus

overcoming the result in Abiteboul and Duschka [ 1998 ]).

Fagin et al. [ 2005a ] focus on the relational case, whereas Yu and Popa [ 2004 ]

extend the target query answering to the XML data model, by also covering the

presence of target constraints (also called nested equality-generating dependen-

cies (NEGDS) . The latter presence further complicates the problem of defining

the correct query answering semantics, since merging rules at the target have to

be taken into account. In Yu and Popa [ 2004 ], a nested extension of the relational

chase [ Fagin et al. 2005a ] is used to accomodate XML target instances. A basic

query rewriting algorithm is presented that consists of four phases, precisely rule

generation, query translation, query optimization, and query assembly. The basic

version ignores the presence of target constraints. Rule generation is done by cre-

ating a rule for each root of the target schema, and by taking the mappings into

consideration. The goal of this phase is to set of mapping rules that fully specify

the target in terms of the sources, and to prepare the target expressions that will

be substituted by source expressions in the next phase. Query translation is done

by translating the target query into a set of decorrelated source queries, by exploit-

ing the set of mappings. Optimization is then performed to eliminate equal Skolem

terms that have been introduced in the previous phase and to guarantee the min-

imization of the rewriting, as one of the cases in Deutsch et al. [ 1999 ]. Finally,

decorrelated queries get assembled into nested source queries.

The above steps are modified when target constraints are present, since the above

query rewriting becomes incomplete. A resolution step needs to be performed before

query optimization takes place, to exhaustively explore all possible rewritings and

M D

Schema Matching and Mapping

Search WWH ::

Custom Search

Home