Database Reference
In-Depth Information
The reasoning is applied to R F : the set of rules
(put in clausal form) and the set of facts generated
as explained before. It aims at inferring all unit
facts in the form of Reconcile(i,j), ¬Reconcile(i,j),
SynVals(v 1 ,v 2 ) and ¬SynVals(v 1 ,v 2 ). Several reso-
lution strategies have been proposed so that the
number of computed resolutions to obtain the
theorem proof is reduced (for more details about
these strategies see (Chang & Lee, 1997)). We have
chosen to use the unit resolution (Henschen & Wos,
1974). It is a resolution strategy where at least one
of the two clauses involved in the resolution is a
unit clause, i.e. reduced to a single literal. The unit
resolution is complete for refutation in the case of
Horn clauses without functions (Henschen & Wos,
1974). Furthermore, it is linear with respect to the
size of clause set (Forbus & de Kleer, 1993). The
unit resolution algorithm that we have implemented
consists in computing the set of unit instantiated
clauses contained in F or inferred by unit resolution
on R F . Its termination is guaranteed because there
are no function symbols in R F . Its completeness
for deriving all the facts that are logically entailed
has been stated in (Saïs et al., 2009).
Solving this equation system is done by an
iterative method inspired from the Jacobi method
(Golub & Loan, 1996), which is fast converging
on linear equation systems. The point is that the
equation system is not linear, due to the use of
the max function for the numerical translation of
the functionality and inverse functionality axi-
oms declared in the RFDS+ schema. Therefore,
we had to prove the convergence of the iterative
method for solving the resulting non linear equa-
tion system.
N2R can be applied alone or in combination
with L2R. In this case, the results of non-recon-
ciliation inferred by L2R are exploited for reduc-
ing the reconciliation space, i.e., the size of the
equation system to be solved by N2R. In addition,
the results of reconciliations and of synonymies
or non-synonymies inferred by L2R are used to
set the values of the corresponding constants or
variables in the equations.
We first use a simple example to illustrate how
the equation system is built. Then, we describe
how the similarity dependencies between refer-
ences are modeled in an equation system and we
provide the iterative method for solving it.
N2R: A Numerical Method for
Reference Reconciliation
Example 2
Let us consider the data descriptions of the example
1 and the reference pairs <S1_r607,S2_r208>,
<S_d1e5, S2_l6f2>, <S1_p112,S2_p222> and
<S1_p112,S2_p232>.
The similarity score Sim r (ref, ref ') between the
references ref and ref ' of each of those pairs is
modeled by a variable: x 1 models Sim r (S1_r607,
S2_r208), x 2 models Sim r (S1_p112,S2_p222), x 3
models Sim r (S1_p112,S2_p232), x 4 models Sim r
(S_d1e5, S2_l6f2)
We obtain the following equations that
model the dependencies between those variables:
x 1 =max(0.68, x 2 , x 3 , x 4 /4) x 2 =max(0.1, x 1 /2)
x 3 =max(0.9, x 1 /2) x 4 =max(0.42, x1).
In this equation system, the first equation ex-
presses that the variable x 1 strongly and equally
depends on the variables x 2 and x 3 , and also on
N2R has two main distinguishing characteristics.
First, it is fully unsupervised: it does not require
any training phase from manually labeled data
to set up coefficients or parameters. Second, it is
based on equations that model the influence be-
tween similarities. In the equations, each variable
represents the (unknown) similarity between two
references while the similarities between values
of attributes are constants that are computed by
using standard similarity measures on strings or
on sets of strings. The functions modeling the
influence between similarities are a combination
of maximum and average functions in order to
take into account the constraints of functionality
and inverse functionality declared in the RFDS+
schema in an appropriate way.
Search WWH ::




Custom Search