Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

instance J i is the core solution. The polynomial-time bound is due to the total num-

ber of endomorphisms that the algorithm explores, which is at most n b for each

block of J 0 ,where b is the maximum number of existentially quantified variables

over all the s-t tgds and n is the number of tuples in J 0 .

Gottlob and Nash [ 2008 ] extended previous results by introducing an algorithm

that computes, still in polynomial time, the core solution of a data exchange problem

whose target constraints are (weakly-acyclic) tgds and arbitrary egds. The authors

introduce novel technical intuitions to compute the core of universal solutions and

prove two complexity bounds. Using an exhaustive enumeration algorithm, they get

an upper bound of O.vm

j b / ,where v is the number of variables in the

canonical solution J , m is the size of J ,and b is the block size of J .Thereexist

cases where a better bound can be achieved by relying on hypertree decomposition

techniques. In such cases, the upper bound is O.vm Œb=2C2 / , with special benefits if

the target constraints of the data exchange scenario are LAV tgds.

The main algorithm in Gottlob and Nash [ 2008 ] has been revised in Savenkov

and Pichler [ 2008 ] (by removing the simulation of egds with full tgds) and in Mar-

nette [ 2009 ] (by replacing a key component of the algorithm with a faster one). Also,

an implementation of the core-computation algorithm in Gottlob and Nash [ 2008 ]

has been developed [ Savenkov and Pichler 2008 ]: the prototype uses a relational

DBMS to chase tgds and egds, and a specialized engine to find endomorphisms and

minimize the universal solution.

The algorithms above provide a very general solution to the problem of com-

puting core solutions for a data exchange setting and made significant steps toward

the goal of integrating core computations into schema mapping systems. However,

experience with these algorithms shows that, although polynomial, they require

very high computing times since they look for all possible endomorphisms among

tuples in the canonical solution [ Savenkov and Pichler 2008 ; Mecca et al. 2009a ].

As a consequence, recursive algorithms iterating on intermediate instances hardly

scale to large mapping scenarios: the necessity to study more scalable solutions than

postprocessing approaches motivated the works that follow.

j

dom.J/

4.3

Generating Core Solutions with SQL Scripts

The fact that s-t tgds produced by first-generation schema mapping systems may

generate redundancy in the target has motivated several practical proposals toward

the goal of removing such redundant data. Unfortunately, some of these works are

applicable only in some cases and do not represent a general solution to the problem.

Only recently, there have been proposals for general rewriting techniques that are

able to obtain core solution with executable scripts. As we discuss next, all the

mapping systems that attempted to reduce the redundancy in the solutions started

from the formalism and the algorithms in Popa et al. [ 2002 ].

Early attempts. Nested mappings [ Fuxman et al. 2006 ] are s-t tgds that extend the

language of s-t tgds allowing the arbitrary nesting of mapping formulas within other

Schema Matching and Mapping

Search WWH ::

Custom Search

Home