Databases Reference
In-Depth Information
instance J i is the core solution. The polynomial-time bound is due to the total num-
ber of endomorphisms that the algorithm explores, which is at most n b for each
block of J 0 ,where b is the maximum number of existentially quantified variables
over all the s-t tgds and n is the number of tuples in J 0 .
Gottlob and Nash [ 2008 ] extended previous results by introducing an algorithm
that computes, still in polynomial time, the core solution of a data exchange problem
whose target constraints are (weakly-acyclic) tgds and arbitrary egds. The authors
introduce novel technical intuitions to compute the core of universal solutions and
prove two complexity bounds. Using an exhaustive enumeration algorithm, they get
an upper bound of O.vm
j b / ,where v is the number of variables in the
canonical solution J , m is the size of J ,and b is the block size of J .Thereexist
cases where a better bound can be achieved by relying on hypertree decomposition
techniques. In such cases, the upper bound is O.vm Œb=2C2 / , with special benefits if
the target constraints of the data exchange scenario are LAV tgds.
The main algorithm in Gottlob and Nash [ 2008 ] has been revised in Savenkov
and Pichler [ 2008 ] (by removing the simulation of egds with full tgds) and in Mar-
nette [ 2009 ] (by replacing a key component of the algorithm with a faster one). Also,
an implementation of the core-computation algorithm in Gottlob and Nash [ 2008 ]
has been developed [ Savenkov and Pichler 2008 ]: the prototype uses a relational
DBMS to chase tgds and egds, and a specialized engine to find endomorphisms and
minimize the universal solution.
The algorithms above provide a very general solution to the problem of com-
puting core solutions for a data exchange setting and made significant steps toward
the goal of integrating core computations into schema mapping systems. However,
experience with these algorithms shows that, although polynomial, they require
very high computing times since they look for all possible endomorphisms among
tuples in the canonical solution [ Savenkov and Pichler 2008 ; Mecca et al. 2009a ].
As a consequence, recursive algorithms iterating on intermediate instances hardly
scale to large mapping scenarios: the necessity to study more scalable solutions than
postprocessing approaches motivated the works that follow.
j
dom.J/
4.3
Generating Core Solutions with SQL Scripts
The fact that s-t tgds produced by first-generation schema mapping systems may
generate redundancy in the target has motivated several practical proposals toward
the goal of removing such redundant data. Unfortunately, some of these works are
applicable only in some cases and do not represent a general solution to the problem.
Only recently, there have been proposals for general rewriting techniques that are
able to obtain core solution with executable scripts. As we discuss next, all the
mapping systems that attempted to reduce the redundancy in the solutions started
from the formalism and the algorithms in Popa et al. [ 2002 ].
Early attempts. Nested mappings [ Fuxman et al. 2006 ] are s-t tgds that extend the
language of s-t tgds allowing the arbitrary nesting of mapping formulas within other
Search WWH ::




Custom Search