Databases Reference
In-Depth Information
First of all, within the data exchange theory the core has been studied only for
relational settings, to date there is no formal definition of core solutions for nested
scenarios. We believe such a notion is needed in many practical scenarios.
Postprocessing algorithms [ Fagin et al. 2005b ; Gottlob and Nash 2008 ; Savenkov
and Pichler 2008 ; Marnette 2009 ] can handle scenarios with arbitrary target con-
straints, while by using the rewriting algorithms in Mecca et al. [ 2009a ]; ten Cate
et al. [ 2009 ], the best we can achieve is to generate a solution that does not consider
target tgds and edgs. This is especially unsatisfactory for egds, since the obtained
solution violates the required key constraints and it is not even a legal instance for
the target. As shown in Marnette et al. [ 2010 ], this may lead to a high level of
redundancy, which can seriously impair both the efficiency of the translation and
the quality of answering queries over the target database.
In fact, handling egds is a complicated task. As conjectured in ten Cate et al.
[ 2009 ], it has recently been shown [ Marnette et al. 2010 ] that it is not possible, in
general, to get an universal solution that enforces a set of egds using a first-order
language as SQL. For the class of target egds that correspond to functional depen-
dencies, the most common in practical settings Marnette et al. [ 2010 ] introduced
a best-effort rewriting algorithm that takes as input a scenario with s-t tgds and
egds and, whenever this is possible, it rewrites it into a new scenario without egds.
Moreover, this algorithm can be combined with existing mapping rewriting algo-
rithms [ Mecca et al. 2009a ; ten Cate et al. 2009 ] to obtain SQL scripts that generate
core solutions. The paper shows that handling target egds efficiently is possible in
many practical cases. This is particularly important in real-world applications of
mappings, where key constraints are often present and play an important role.
Another important open problem concerns the expressibility of the GUI of a
schema mapping tool. Indeed, many GUIs are limited in the set of primitives they
use to specify the mapping scenarios and need to be enriched in several ways. For
instance, it would be useful to be able to duplicate sets in the source and in the target
and, thus, handle tgds that contain duplicate tables. To a further extent, full control
over joins in the two data sources becomes a crucial requirement of schema mapping
GUIs, in addition to those corresponding to foreign key constraints; by using this
feature, users can specify arbitrary join paths, like self-joins themselves.
This richer set of primitives poses some challenges with respect to the mapping
generation and rewriting algorithms as well. In particular, duplications in the target
correspond to different ways of contributing tuples to the same set. As we discussed
above, this makes the generation of core solutions more delicate, since there exist
tgds that write more than one tuple at a time in the same target table, and therefore
redundancy can be generated not only across different tgds, but also by firing a
single tgd [ Mecca et al. 2009a ; ten Cate et al. 2009 ].
Second generation mapping systems have certainly enlarged the class of map-
pings scenarios that can be handled using a GUI, but a formal characterization of
the exact class of mappings that can be expressed with them is still missing. For
instance, it is still unclear if every mapping made of conjunctive queries can be
expressed by existing GUIs.
Search WWH ::




Custom Search