Database Reference
In-Depth Information
Table 1. Example of iterative similarity computation
Iterations
0
1
2
3
4
x 1 =max(0.68, x 2 ,x 3 ,¼ * x4 )
0
0.68
0.9
0.9
0.9
x 2 =max(0.1, ½ * x 1 )
0
0.1
0.34
0.45
0.45
x 3 =max(0.9, ½ * x 1 )
0
0.9
0.9
0.9
0.9
x 4 =max(0.42, x 1 )
0
0.42
0.68
0.9
0.9
tions of the reference pairs. The weights used in
the value computation of the variables x 1 , x 2 , x 3
and x 4 are respectively: λ11 = ¼, λ21 = ½, λ31=
½ and λ41 = ½.
We assume that fixpoint precision ε is equal
to 0.005.
The equation system is the one given in Ex-
ample 2. The different iterations of the resulting
similarity computation are provided in Table 1.
The solution of the equation system is
X=(0.9,0.45,0.9,0.9) . This corresponds to the
similarity scores of the four reference pairs. The
fixpoint has been reached after four iterations.
The error vector is then equal to 0. If we fix
the reconciliation threshold T rec at 0.80, then we
obtain three reconciliation decisions: two cities,
two museums and two paintings.
call and the precision can be easily obtained by
computing the ratio of the reconciliations or non-
reconciliations obtained by L2R and N2R among
those that are provided in the benchmark.
L2R Results
Since the set of reconciliations and the set of
non-reconciliations are obtained by a logical
resolution-based algorithm the precision is of
100% by construction. Then, the measure that
it is meaningful to evaluate in our experiments
is the recall. We focus on the results obtained
for the Article and Conference classes, which
contain respectively 1295 references and 1292
references.
As presented in the column named “RDFS+”
of the Table 2, the recall is 50.7%. This can be
refined in a recall of 52.7% computed on the
REC subset and a recall of 50.6% computed on
NREC subset.
For this data set, the RDFS+ schema can be
easily enriched by the declaration that the prop-
erty confYear is discriminant. When this property
is exploited, the recall on NREC subset grows
to 94.9%, as it is shown in the “RDFS+ & DP”
column. This significant improvement is due to
chaining of different rules of reconciliations:
the non-reconciliations on references to confer-
ences for which the values of the confYear are
different entail in turn non-reconciliations of the
associated articles by exploiting the constraint
PF( published ).
This recall is comparable to (while a little bit
lower than) the recall on the same data set ob-
tained by supervised methods like e.g., (Dong et
Experiments
L2R and N2R have been implemented and tested
on the benchmark Cora ii (used by (Dong et al.,
2005; Parag & Domingos, 2005)). It is a collection
of 1295 citations of 112 different research papers
in computer science. For this data set, the UNA is
not stated and the RDF facts describe references,
which belong to three different classes ( Article ,
Conference , Person ). We have designed a simple
RDFS schema on the scientific publication do-
main, which we have enriched with disjunction
constraints (e.g. DISJOINT( Article , Conference )),
a set of functional property constraints (e.g.
PF( published ), PF( confName )) and a set of inverse
functional property constraints (e.g. PFI( little ,
year , type ), PFI( confName , confYear )). The re-
 
Search WWH ::




Custom Search