Database Reference
In-Depth Information
In order to improve their efficiency, some re-
cent methods exploit knowledge which is either
learnt by using supervised algorithms or explicitly
specified by a domain expert. For instance, in
(Dey et al., 1998b; Dong et al., 2005), knowledge
about the impacts of the different attributes or
relations are encoded in weights by an expert or
learnt on labelled data. However, these methods
are time consuming and dependent on the human
experience for labelling the training data or to
specify declaratively additional knowledge for
the reference reconciliation. Both the L2R and
N2R methods exploit the semantics on the schema
or on the data, expressed by a set of constraints.
They are unsupervised methods since no labelled
data is needed by either L2R or N2R.
Most of the existing methods infer only rec-
onciliation decisions. However, some methods
infer non-reconciliation decisions for reducing
the reconciliation space. This is the case for the
so-called blocking methods introduced in (New-
combe, 1962) and used in recent approaches such
as (Baxter et al., 2003).
because it extends RDFS with some OWL-DL
primitives and SWRL rules, both being used to
state constraints that enrich the semantics of the
classes and properties declared in RDFS. Then
we describe the XML sources we are interested in
and the mappings that are automatically generated
and then used as inputs of the data extraction and
transformation process.
The RDFS+ Data Model
RDFS+ can be viewed as a fragment of the rela-
tional model (restricted to unary and binary rela-
tions) enriched with typing constraints, inclusion
and exclusion between relations and functional
dependencies.
The Schema and its Constraints
A RDFS schema consists of a set of classes (unary
relations) organized in a taxonomy and a set of
typed properties (binary relations). These proper-
ties can also be organized in a taxonomy of proper-
ties. Two kinds of properties can be distinguished
in RDFS: the so-called relations, the domain and
the range of which are classes and the so-called
attributes, the domain of which is a class and the
range of which is a set of basic values (e.g. Integer,
Date, Literal). For example, in the RDFS schema
presented in Figure 2, we have a relation located
having as domain the class CulturalPlace and as
range the class Address . We also have an attribute
name having as domain the class CulturalPlace
and as range the data type Literal.
We allow the declaration of constraints ex-
pressed in OWL-DL or in SWRL in order to
enrich the RDFS schema. The constraints that we
consider are of the following types:
THE PICSEL3 DATA ExTRACTION,
TRANSFORMATION AND
INTEGRATION APPROACH
In this section, we first define the data model used
to represent the ontology and the data, the exter-
nal XML sources and the mappings. In a second
sub-section, we present the data extraction and
transformation tasks and then the two reconcili-
ation techniques (L2R and N2R) followed by a
summary of the results that we have obtained by
performing these methods on data sets related to
the scientific publications.
Constraints of disjunction between classes:
Data Model, xML Sources
and Mappings
DISJOINT( C,D ) is used to declare that the
two classes C and D are disjoint, for ex-
ample: DISJOINT ( CulturalPlace , Artist ).
Constraints of functionality of properties:
We first describe the data model used to represent
the ontology O . This model is called RDFS+
PF( P ) is used to declare that the property P
Search WWH ::




Custom Search