Data Extraction, Transformation and Integration Guided by an Ontology - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

•

Translation of the

constraints on the data

by inheritance, a rule is generated to express the

fact that their references cannot be reconciled. A

transitivity rule allows inferring new reconcilia-

tion decisions by applying transitivity on the set

of already inferred reconciliations.

See (Saïs et al., 2009) for a complete descrip-

tion of the generation process of reconciliation

rules.

sources

The UNA assumption, if it is stated on the

sources S 1 and S 2 , is translated automatically by

four rules. For example, the following rule R1 ex-

presses the fact that two distinct references coming

from the same source cannot be reconciled.R1:

Src1(x) ∧ Src1(y) ∧ (x≠y) ⇒ ¬ Reconcile(x,y)

where Src i (x) means that the reference x is coming

from a source S i .

Analogous rules express that one reference

coming from a source S i can be reconciled with

at most one reference coming from a source S j .

Similarly, two rules are generated for translating

LUNA semantics.

Reasoning Method for Reference Reconcili-

ation

In order to infer sure reconciliation and non-

reconciliation decisions, we apply an automatic

reasoning method based on the resolution prin-

ciple (Robinson, 1965; Chang & Lee, 1997). This

method applies to the clausal form of the set of rules

R described above and a set of facts F describing

the data, which is generated as follows.

•

Translation of the schema

constraints.

For each relation R declared as functional by

the constraint PF( R ), the following rule R6.1( R )

is generated:R6.1(R): Reconcile(x, y) ∧ R(x, z)

∧ R(y, w) ⇒ Reconcile(z, w)

For example, the following rule is generated

concerning the relation located which relates refer-

ences of cultural places to references of addresses

and which is declared functional:R6.1(located):

Reconcile(x, y) ∧ located(x, z) ∧ located(y, w)

⇒ Reconcile(z, w)

For each attribute A declared as functional by

the axiom PF( A ), a similar rule which concludes

on SynVals is generated.

Likewise, analogous rules are generated for each

relation R and each attribute A declared as inverse

functional. Rules are also generated for translating

combined constraints PF( P 1 ,..., P n ) and PFI( P 1 ,...,

P n ) of (inverse) functionality. For example, the

declaration PFI( paintedBy, paintingName ) states a

composed functional dependency which expresses

that the artist who painted it jointly with its name

functionally determines a painting.

For each pair of classes C and D involved

in a DISJOINT( C,D ) statement declared in the

schema, or such that their disjunction is inferred

•

Generation of the set of facts.

The set of RDF facts corresponding to the

description of the data in the two sources S1 and

S2 is augmented with the generation of:

•

new class-facts, relation-facts and attri-

bute-facts derived from the domain and

range constraints that are declared in RDFS

for properties, and from the subsumption

statements ;

facts of the form

•

Src 1 (i) and Src 2 (j) ;

synonymy facts of the form

•

SynVals(v 1 ,v 2 )

for each pair (v 1 ,v 2 ) of basic values that are

identical (up to some punctuation or case

variations) ;

non

•

synonymy

facts

the

form

¬SynVals(v 1 ,v 2 ) for each pair (v 1 ,v 2 ) of

distinct basic values of a functional attri-

bute for which it is known that each pos-

sible value has a single form. For instance,

¬SynVals(“France”, “Algeria”) can be

added.

Resolution-based algorithm for reference

•

reconciliation.

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home