Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

First of all, within the data exchange theory the core has been studied only for

relational settings, to date there is no formal definition of core solutions for nested

scenarios. We believe such a notion is needed in many practical scenarios.

Postprocessing algorithms [ Fagin et al. 2005b ; Gottlob and Nash 2008 ; Savenkov

and Pichler 2008 ; Marnette 2009 ] can handle scenarios with arbitrary target con-

straints, while by using the rewriting algorithms in Mecca et al. [ 2009a ]; ten Cate

et al. [ 2009 ], the best we can achieve is to generate a solution that does not consider

target tgds and edgs. This is especially unsatisfactory for egds, since the obtained

solution violates the required key constraints and it is not even a legal instance for

the target. As shown in Marnette et al. [ 2010 ], this may lead to a high level of

redundancy, which can seriously impair both the efficiency of the translation and

the quality of answering queries over the target database.

In fact, handling egds is a complicated task. As conjectured in ten Cate et al.

[ 2009 ], it has recently been shown [ Marnette et al. 2010 ] that it is not possible, in

general, to get an universal solution that enforces a set of egds using a first-order

language as SQL. For the class of target egds that correspond to functional depen-

dencies, the most common in practical settings Marnette et al. [ 2010 ] introduced

a best-effort rewriting algorithm that takes as input a scenario with s-t tgds and

egds and, whenever this is possible, it rewrites it into a new scenario without egds.

Moreover, this algorithm can be combined with existing mapping rewriting algo-

rithms [ Mecca et al. 2009a ; ten Cate et al. 2009 ] to obtain SQL scripts that generate

core solutions. The paper shows that handling target egds efficiently is possible in

many practical cases. This is particularly important in real-world applications of

mappings, where key constraints are often present and play an important role.

Another important open problem concerns the expressibility of the GUI of a

schema mapping tool. Indeed, many GUIs are limited in the set of primitives they

use to specify the mapping scenarios and need to be enriched in several ways. For

instance, it would be useful to be able to duplicate sets in the source and in the target

and, thus, handle tgds that contain duplicate tables. To a further extent, full control

over joins in the two data sources becomes a crucial requirement of schema mapping

GUIs, in addition to those corresponding to foreign key constraints; by using this

feature, users can specify arbitrary join paths, like self-joins themselves.

This richer set of primitives poses some challenges with respect to the mapping

generation and rewriting algorithms as well. In particular, duplications in the target

correspond to different ways of contributing tuples to the same set. As we discussed

above, this makes the generation of core solutions more delicate, since there exist

tgds that write more than one tuple at a time in the same target table, and therefore

redundancy can be generated not only across different tgds, but also by firing a

single tgd [ Mecca et al. 2009a ; ten Cate et al. 2009 ].

Second generation mapping systems have certainly enlarged the class of map-

pings scenarios that can be handled using a GUI, but a formal characterization of

the exact class of mappings that can be expressed with them is still missing. For

instance, it is still unclear if every mapping made of conjunctive queries can be

expressed by existing GUIs.

Schema Matching and Mapping

Search WWH ::

Custom Search

Home