Database Reference
In-Depth Information
THE ONTOLOGY-BASED DATA
INTEGRATION SYSTEM
feeds the decision support system with data about
the chemical contamination and the consumption
of food products. The data integration system is
managed using a data warehouse approach: data
sources provided by external partners are repli-
cated locally and standardized using ETL (Extract,
Transform, Load) technology.
In this chapter, we present the ontology-based
data integration system which takes into account
the three data integration problems presented
above: data heterogeneity, data rarity and data
imprecision. The ontology-based data integration
system proposes three different ways to integrate
data according to a domain ontology. The first one
is a semantic annotation process which allows a
local database (the CONTA local database), in-
dexed by a domain ontology, to be extended with
data that have been extracted from the Web and
semantically annotated according to this domain
ontology. The second one, which is an original
contribution of this chapter, is a querying system
which allows the semantically annotated Web
data to be integrated with the local data through
a uniform flexible querying system relying on a
domain ontology (the CONTA ontology). The
third one is an ontology alignment method rely-
ing on rules which allow correspondences to be
found between objects of a source ontology (the
CONSO ontology) and objects of a target ontol-
ogy (the CONTA ontology) according to their
characteristics and associated values. Those three
ways to integrate data have been designed using
the Semantic web approach, an international ini-
tiative, which proposes annotating data sources
using ontologies in order to manage them more
efficiently.
In this chapter, we first present the ontology-
based data integration system. We then provide
some background on the topic. Third, current
projects and future trends are presented. We con-
clude this chapter in the last section.
This section describes the different construction
steps of the ontology-based data integration sys-
tem of the CARAT system. In the first section,
we present the filling of its data sources. In the
second section, we present its querying system.
In the third section, we present the alignment
between objects of its two data sources which are
indexed by distinct ontologies.
Filling the Data Warehouse
There are two types of data available in the CARAT
system: contamination data and consumption data.
Both types of data concern food products but their
content and their treatment are not the same. The
contamination data are measures of level of chemi-
cal contamination for food products whereas the
consumption data are about household purchases
of food products during a year.
The contamination data are stored in a relational
database, called CONTA local database, which has
been defined and filled by our research team from
different sources. It is indexed by the CONTA
ontology. The consumption data are stored in
a relational database, called CONSO database,
which is filled from the TNS WORLD PANEL
source, a private source of household purchases.
It is indexed by the CONSO ontology. Both da-
tabases are filled using ETL technology.
In this section, we make a focus on two original
characteristics of the contamination data which
must be taken into account during their storage:
their imprecision and their rarity. On the one hand,
we propose to use the fuzzy set theory in order
to represent imprecise data. On the other hand,
we propose to search and annotate data from the
Web using the CONTA ontology in order to extend
the CONTA local database. We first present the
structure of the CONTA ontology. We then present
the fuzzy set theory used to treat the imprecise
Search WWH ::




Custom Search