Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

algorithms allows to have similar benefits to those gained by nested mappings in

different relational settings. In particular, those techniques generate target data with

less redundancy, as source data involved in the same target key constraints is copied

by the generated queries only once.

Unfortunately, the approaches above are applicable only in some specific cases:

the above techniques benefits apply only when schemas and correspondences obey

to certain structures or require the presence of key constraints to reduce the redun-

dancy in the output. Therefore, those approaches do not represent a general solution

to the problem of generating neither core nor compact universal solutions.

SQL core-generation algorithms. The following systems introduce core computa-

tion algorithms that, given a set of s-t tgds, enable a more efficient implementation

by means of executable scripts that scale well to large databases. This problem

has been first approached in Chiticariu [ 2005 ], where an algorithm is presented

for schema mappings specified by the limited class of s-t tgds with single atomic

formulas (without repetition of existential quantified variables) in the conclusions.

The first complete proposal of an algorithm for rewriting s-t tgds to generate

core solutions was introduced in Mecca et al. [ 2009a ]. This work is based on the

exploiting of two key ideas: the notion of homomorphism among formulas and the

use of negation to rewrite tgds.

m 1 :

s;n

NYSE .s;n/

Company .I;n;s/

m 2 :

n;c;a;pi

Public-Company .n;c/

Public-Grant .a;pi;n/

I;S

Company .I;n;S/

Grant .a;I/

m 3 :

i;n;s

NSF-Grantee .i;n;s/

Company .i;n;s/

m 4 :

a;c

NSF-Grant .a;c/

Grant .a;c/:

The first intuition is that it is possible to analyze the set of formulas to recognize

when two tgds may generate redundant tuples in the target. This happens when it

is possible to find a homomorphism between the right-hand sides of the two tgds.

Consider the right-hand sides of the s-t tgds m 1 and m 3 from the Example in Sect. 2

reported here for convenience; with an abuse of notation, we treat the two formu-

las as sets of tuples, with existentially quantified variables that correspond to nulls.

It can be seen that the conclusion Company .I;n;s/ of m 1

can be mapped into the

conclusion Company .i;n;s/ of m 3

by the following mapping of variables: I

i ,

s ; in this case, they say that m 3 subsumes m 1 . This gives a condition to

intercept possible redundancy that is general (i.e., key constraint on the target is not

required to identify causes of redundancy) and necessary, since the actual generation

of endomorphisms among facts in the target data depends on values coming from

the source. From the complexity viewpoint, checking for the presence of homomor-

phisms among formulas, i.e., conclusions of tgds, is completely different than doing

the same check among instance tuples: since the number of tgds is typically order

of magnitudes smaller than the size of an instance, the check among formulas can

be carried out very quickly.

n , s

Schema Matching and Mapping

Search WWH ::

Custom Search

Home