Data Extraction, Transformation and Integration Guided by an Ontology - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

Figure 3. Example of a DTD tree

provides examples of data coming from two RDF

data sources S 1 and S 2 , which conform to a same

RDFS+ schema describing the cultural application

previously mentioned.

Example 1: An example of RDF data

Source S1: Museum(r607); name(r607, “ Le

Louvre “ ); located(r607, d1e5); Address(d1e5);

town(d1e5, “ Paris ” ); contains(r607, p112);

paintingName(p112, “ La Joconde ”); Source

S2 : Museum(r208); name(r208, “ musée du

Louvre ” ); located(r208, l6f2); Address(l6f2);

town(l6f2, “ ville de Paris ” ); contains(r208, p222)

; paintingName(p222, “Iris “); contains(r208,

p232); paintingName(p232, “ Joconde ”);

We consider two kinds of axioms accounting

for the Unique Name Assumption (UNA) and

the Local Unique Name Assumption (denoted

LUNA). The UNA states that two data of the same

data source having distinct references refer to two

different real world entities (and thus cannot be

reconciled). Such an assumption is valid when a

data source is clean. The LUNA is weaker than the

UNA, and states that all the references related to

a same reference by a relation refer to real world

entities that are pairwise distinct.

Figure 3 is an example of a DTD of a source to

be integrated. It is represented by the tree T 1 . A

fragment of the XML document conformed to the

DTD tree T 1 is presented in Figure 4.

The Mappings

Mappings are computed in a semi-automatic way.

They are links between the ontology O and a DTD

tree D (elements or attributes). The format of the

mappings for the classes and the properties of O

is described just below.

When c 1 is a concept of O , the format of the

mappings may be:

The XML Sources

•

1 ↔ //e

•

1 ↔ //e/@att

The XML sources that we are interested in are

valid documents, instances of a DTD that defines

their structure. We consider DTDs without enti-

ties or notations. A DTD can be represented as

an acyclic oriented graph with one node for each

element definition. The links between two nodes

are composition links. The attributes associated

to the elements in a DTD are associated to ele-

ment nodes in the graph representing to the DTD.

Because the DTDs are acyclic, their associated

graph may be represented as a forest of trees,

whose roots correspond to entry points in the graph

(nodes without predecessors). Nodes shared in

the graph by several trees are duplicated in order

to make these trees independent of each other.

•

1 ↔ //e/[@att = 'val']/@att

When R is a relation between c 1 and c 2 of O

such that ∃ c 1 ↔ //a and c 2 ↔ //b, the format of

the mapping is:r 1 (c 1 , c 2 ) ↔ r 1 (//a, //a/ …/b)

When A is an attribute of c 1 represented in

the ontology O such that ∃ c 1 ↔ //a and b being

mapped to A in T , the format of the mapping is:

A of c 1 ↔ A(//a, //a/ …/b)

In this format, ↔ indicates a mapping link

between entities in O and entities in T defined by

their path using XPath (Berglund et al., 2007) in

the associated graph. e refers to an element in T ,

@att refers to the attribute att .

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home