Discovery and Correctness of Schema Mapping Transformations - Schema Matching and Mapping

Databases Reference

In-Depth Information

4.2

Theoretical Results on Core Computation

The first approach that has been studied to generate the core for a relational data

exchange problem is to generate the canonical solution, and then to apply a post-

processing algorithm for its core identification. It is known that computing the core

of an arbitrary relational instance with variables is NP-complete, as many NP-

complete problems (e.g., computing the core of a graph [ Fagin et al. 2005b ; Hell

and Nesetril 1992 ] or conjunctive query optimization [ Chandra and Merlin 1977 ])

can be reduced to it. In contrast with the case of computing the core of an arbitrary

instance, computing the core of a universal solution in data exchange can be done

in polynomial time.

In Fagin et al. [ 2005b ], an algorithm is presented that computes the core in

polynomial time in a restricted setting, that is, for a data exchange problem whose

source-to-target constraints are tgds and whose target constraints consist of arbitrary

egds only. More specifically, they proved that the core of a universal solution can be

computed in polynomial time in two settings: (1) when the set of target constraints

is empty, (2) when the set of target constraints contains egds only. To address these

goals, two different methods are provided.

A greedy algorithm, given a source instance I , first computes an universal solu-

tion J for I , if it exists, and then computes its core by successively removing tuples

from J , as long as I and the instance resulting in each step satisfy the s-t tgds

and the target constraints. Although the greedy algorithm is conceptually simple, it

requires the availability of the source instance I for the execution of the algorithm.

The blocks method does not require the availability of the source instance and is

based on the relationships among the labeled nulls of a canonical solution J .The

Gaifman graph of the nulls of J is an undirected graph in which the nodes are the

nulls of J and there exists an edge between two labeled nulls whenever there exists

some tuple in some relation of J in which both labeled nulls occur. A block of

nulls is the set of nulls in a connected component of the Gaifman graph of the nulls.

Given J as the result of applying the s-t tgds to a ground source instance S , the block

method starts from the observation that the Gaifman graph of the labeled nulls of

the result instance J consists of connected components whose size is bounded by a

constant b . The main step of the algorithm relies on the observation that checking

whether there is a homomorphism from any J

K ,where K is any set of instances

with such bound b , into any arbitrary other instance J 0 is feasible in polynomial

time. The algorithm works also for the case where the target constraints consist of

egds, which, when applied, can merge blocks by equating variables from different

blocks. Thus, after chasing J with egds, the resulting J 0 can lost the bounded block-

size property. However, the authors show an algorithm that looks at the nulls in J

and computes its core by successively finding and applying a sequence of small

useful endomorphisms; where useful means that at least one null disappears. More

practically, (1) the algorithm starts computing a canonical universal solution J 0 ,(2)

then it recursively generates a sequence of intermediate instances such that, given

the intermediate instance J i , there is a useful endomorphism that is the identity

everywhere except for a block from J i

2

to J i C1 ; (3) when the algorithm stops, the

Schema Matching and Mapping

Search WWH ::

Custom Search

Home