Databases Reference
In-Depth Information
4.2
Theoretical Results on Core Computation
The first approach that has been studied to generate the core for a relational data
exchange problem is to generate the canonical solution, and then to apply a post-
processing algorithm for its core identification. It is known that computing the core
of an arbitrary relational instance with variables is NP-complete, as many NP-
complete problems (e.g., computing the core of a graph [ Fagin et al. 2005b ; Hell
and Nesetril 1992 ] or conjunctive query optimization [ Chandra and Merlin 1977 ])
can be reduced to it. In contrast with the case of computing the core of an arbitrary
instance, computing the core of a universal solution in data exchange can be done
in polynomial time.
In Fagin et al. [ 2005b ], an algorithm is presented that computes the core in
polynomial time in a restricted setting, that is, for a data exchange problem whose
source-to-target constraints are tgds and whose target constraints consist of arbitrary
egds only. More specifically, they proved that the core of a universal solution can be
computed in polynomial time in two settings: (1) when the set of target constraints
is empty, (2) when the set of target constraints contains egds only. To address these
goals, two different methods are provided.
A greedy algorithm, given a source instance I , first computes an universal solu-
tion J for I , if it exists, and then computes its core by successively removing tuples
from J , as long as I and the instance resulting in each step satisfy the s-t tgds
and the target constraints. Although the greedy algorithm is conceptually simple, it
requires the availability of the source instance I for the execution of the algorithm.
The blocks method does not require the availability of the source instance and is
based on the relationships among the labeled nulls of a canonical solution J .The
Gaifman graph of the nulls of J is an undirected graph in which the nodes are the
nulls of J and there exists an edge between two labeled nulls whenever there exists
some tuple in some relation of J in which both labeled nulls occur. A block of
nulls is the set of nulls in a connected component of the Gaifman graph of the nulls.
Given J as the result of applying the s-t tgds to a ground source instance S , the block
method starts from the observation that the Gaifman graph of the labeled nulls of
the result instance J consists of connected components whose size is bounded by a
constant b . The main step of the algorithm relies on the observation that checking
whether there is a homomorphism from any J
K ,where K is any set of instances
with such bound b , into any arbitrary other instance J 0 is feasible in polynomial
time. The algorithm works also for the case where the target constraints consist of
egds, which, when applied, can merge blocks by equating variables from different
blocks. Thus, after chasing J with egds, the resulting J 0 can lost the bounded block-
size property. However, the authors show an algorithm that looks at the nulls in J
and computes its core by successively finding and applying a sequence of small
useful endomorphisms; where useful means that at least one null disappears. More
practically, (1) the algorithm starts computing a canonical universal solution J 0 ,(2)
then it recursively generates a sequence of intermediate instances such that, given
the intermediate instance J i , there is a useful endomorphism that is the identity
everywhere except for a block from J i
2
to J i C1 ; (3) when the algorithm stops, the
Search WWH ::




Custom Search