Database Reference
In-Depth Information
Figure 5. A query (on the left side) and the extracted data (on the right side) from S 1
to the data of S 1 and S 2 . The problem consists in
deciding whether references are reconciled or not
reconciled. Let Reconcile be a binary predicate.
Reconcile(X, Y) means that the two references
denoted by X and Y refer to the same world entity.
The reference reconciliation problem considered
in L2R consists in extracting from the set I 1 × I 2
of reference pairs two subsets REC and NREC
such that:
reconciliation ( ¬Reconcile(i',j') ) from a set of facts
and a set of rules which transpose the semantics of
the data sources and of the schema into logical de-
pendencies between reference reconciliations. Facts
of synonymy ( SynVals(v 1 ,v 2 ) ) and of no synonymy
( ¬ SynVals(u 1 , u 2 ) ) between basic values (strings,
dates) are also inferred. For instance, the synonymy
SynVals(“JoDS”, “Journal of Data Semantics”)
may be inferred. The L2R distinguishing features
are that it is global and logic-based: every constraint
declared on the data and on the schema in RDFS+ is
automatically translated into first-order logic Horn
rules (rules for short) that express dependencies
between reconciliations. The advantage of such
a logical approach is that if the data are error-free
and if the declared constraints are valid, then the
reconciliations and non-reconciliations that are
inferred are correct, thus guaranteeing a 100%
precision of the results.
We first describe the generation of the recon-
ciliation rules. Then we present the generation
of the facts and finally the reasoning, which is
performed on the set of rules and facts.
{
}
()
REC
=
ii
, ' /
Reconcileii
( , ')
{
}
()
NREC
=
ii ØReconcile ii
, ',
(, ')
The reference reconciliation problem consid-
ered in N2R consists in, given a similarity function
Sim r :I 1 × I 2 [0..1], and a threshold T rec (a real
value in [0..1] given by an expert, fixed experi-
mentally or learned on a labeled data sample),
computing the following set:
REC
=
{( ,')(
i i
∈ ×
I
I
)\(
EC
NREC
),
tq Simii
.
(, ')
>
T
}
NR
2
1
2
r
rec
L2R: A Logical Method for
Reference Reconciliation
Generation of the Set of Reconciliation Rules
They are automatically generated from the con-
straints that are declared on the data sources and
on their common schema.
L2R (Saïs et al., 2007) is based on the inference of
facts of reconciliation ( Reconcile(i,j) ) and of non-
Search WWH ::




Custom Search