Databases Reference
In-Depth Information
algorithms allows to have similar benefits to those gained by nested mappings in
different relational settings. In particular, those techniques generate target data with
less redundancy, as source data involved in the same target key constraints is copied
by the generated queries only once.
Unfortunately, the approaches above are applicable only in some specific cases:
the above techniques benefits apply only when schemas and correspondences obey
to certain structures or require the presence of key constraints to reduce the redun-
dancy in the output. Therefore, those approaches do not represent a general solution
to the problem of generating neither core nor compact universal solutions.
SQL core-generation algorithms. The following systems introduce core computa-
tion algorithms that, given a set of s-t tgds, enable a more efficient implementation
by means of executable scripts that scale well to large databases. This problem
has been first approached in Chiticariu [ 2005 ], where an algorithm is presented
for schema mappings specified by the limited class of s-t tgds with single atomic
formulas (without repetition of existential quantified variables) in the conclusions.
The first complete proposal of an algorithm for rewriting s-t tgds to generate
core solutions was introduced in Mecca et al. [ 2009a ]. This work is based on the
exploiting of two key ideas: the notion of homomorphism among formulas and the
use of negation to rewrite tgds.
m 1 :
8
s;n
W
NYSE .s;n/
!9
I
W
Company .I;n;s/
m 2 :
8
n;c;a;pi
W
Public-Company .n;c/
^
Public-Grant .a;pi;n/
!
9
I;S
W
Company .I;n;S/
^
Grant .a;I/
m 3 :
8
i;n;s
W
NSF-Grantee .i;n;s/
!
Company .i;n;s/
m 4 :
8
a;c
W
NSF-Grant .a;c/
!
Grant .a;c/:
The first intuition is that it is possible to analyze the set of formulas to recognize
when two tgds may generate redundant tuples in the target. This happens when it
is possible to find a homomorphism between the right-hand sides of the two tgds.
Consider the right-hand sides of the s-t tgds m 1 and m 3 from the Example in Sect. 2
reported here for convenience; with an abuse of notation, we treat the two formu-
las as sets of tuples, with existentially quantified variables that correspond to nulls.
It can be seen that the conclusion Company .I;n;s/ of m 1
can be mapped into the
conclusion Company .i;n;s/ of m 3
by the following mapping of variables: I
!
i ,
n
s ; in this case, they say that m 3 subsumes m 1 . This gives a condition to
intercept possible redundancy that is general (i.e., key constraint on the target is not
required to identify causes of redundancy) and necessary, since the actual generation
of endomorphisms among facts in the target data depends on values coming from
the source. From the complexity viewpoint, checking for the presence of homomor-
phisms among formulas, i.e., conclusions of tgds, is completely different than doing
the same check among instance tuples: since the number of tgds is typically order
of magnitudes smaller than the size of an instance, the check among formulas can
be carried out very quickly.
!
n , s
!
Search WWH ::




Custom Search