Ontology-Based Integration of Heterogeneous, Incomplete and Imprecise Data Dedicated to a Decision Support System for Food Safety - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

has been experimentally tested on three different

domains (microbial risk in food, chemical risk

in food and aeronautics): three OWL ontologies

have been created within a couple of hours thanks

to preexisting information retrieved from local

databases and a very simple tool which translates

automatically csv files containing the metadata

into an OWL ontology; second, the structure of

data tables is highly variable (even tables in the

same paper don't have the same structure) and

terms appear in tables with no linguistic context,

that invalidates the annotation techniques that

learn wrappers based on structure and/or textual

context such as Lixto (Baumgartner & al., 2001)

or BWI (Freitag & Kushmerick, 2000). Our ap-

proach can be compared to the construction of

frames from tables described in Pivk & al. (2004)

but they use a generic ontology and create new

relations according to the table signature, whereas

we want to recognize predefined relations in an

ontology specific to the target domain.

In the framework of XML database flexible

querying, different approaches have been proposed

to extend either XPATH or SPARQL. (Campi & al.,

2006) proposes FUZZYXPATH, a fuzzy extension

of XPATH to query XML documents. Extensions

are of two kinds: (i) the 'deep-similar' function

permits a relaxed comparison in term of structure

between the query tree and the data tree; (ii) the

'close' and 'similar' predicates extend the equality

comparison to a similarity comparison between

the content of a node and a given value expressed

in the query. (Hutardo & al., 2006) proposes an

extension of the SPARQL 'Optional' clause (called

Relax). This clause permits to compute a set of

generalizations of the RDF triplets involved in

the SPARQL query using especially declarations

done in the RDF Schema. (Corby & al., 2004)

also proposes the same kind of extension of the

SPARQL query using a distance function applied

to the classes and properties of the RDF Schema.

The originality of our approach in flexible query-

ing is that we propose a complete and integrated

solution which permits (1) to annotate data tables

with the vocabulary defined in an OWL ontology,

(2) to execute a flexible query of the annotated

tables using the same vocabulary and taking into

account the pertinence degrees generated by the

annotation system.

Finally, the ontology alignment problem has

been widely investigated in the literature (Castano

& al., 2007; Euzenat & Shvaiko, 2007; Kalfoglou

& Schorlemmer, 2003; Noy, 2004). Our original-

ity is to treat that problem as a rule application

problem where a source ontology, considered as a

fact base, is aligned with a target one, considered

as a rule base.

FUTURE RESEARCH DIRECTIONS

The domain ontology is the central element of our

data integration system. In the future, we want

to carry on our work on data integration based

on ontology.

First, we intend enhancing the performance

of the annotation system using machine learning

techniques (Doan & al., 2003) on the knowledge

of the ontology but without manual training on a

subset of the corpus. By example, a new classifier

for symbolic types can be added to the existing one

and trained using the domain of values associated

with the symbolic type in the ontology. Second,

we want to integrate the user's opinion on the

query result in order to improve the underlying

semantic annotation process and consequently

to enrich the ontology. Third, since our flexible

querying system allows the user to query uniformly

several sources indexed by the same ontology, we

want to extend our system in order to be able to

query several sources relying on distinct ontolo-

gies which have been previously aligned. Fourth,

one important feature which must be added to @

Web is to be able to detect that data included in

tables retrieved from different documents of the

Web are redundant. We want to use reference

reconciliation methods (Sais & al., 2007) to deal

with this problem.

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home