Database Reference
In-Depth Information
INTRODUCTION
In the literature (Da Silva, Filha, Laender &
Embley, 2002; Saccol & Heuser, 2002; Kim &
Park, 2003; Beneventano, Bergamaschi, Castano,
Antonellis, Ferrara, Guerra, Mandreoli, Ornetti
& Vincini, 2002), there is no consensus on the
significance of heterogeneity. According to the
domain and the type of considered application,
the treatment and the interpretation of heteroge-
neity were made in several ways. Considering
this ambiguous interpretation of heterogeneity,
we adopt, in our integration work, the definitions
below which will enable us to treat all categories
together.
Data sources are known as heterogeneous if
they check one of the two following properties:
Nowadays, the current informational environ-
ment is characterized by strongly distributed
heterogeneous data. Complex applications such
as knowledge extraction, data mining, learning
and web applications use heterogeneous and
distributed data sources (Boussaïd, Darmont,
Bentayeb & Loudcher-Rabaseda, 2008). Thus
the need of integrating and manipulating of large
amount of data is more and more increasing. In
the absence, first, of tools for the heterogeneous
data integration, and second, of formalisms for
modelling the integration of these data, we pro-
pose in this chapter early attempts to formalise
the integration of heterogeneous data and their
maintenance. Indeed, the data can be classified
in three categories: structured (relational and
object data), semi-structured (HTML, XML,
graphs) and unstructured (text, images, sounds)
(see figure 1).
Our contribution is twofold: the first part of
our work concerns the beginning of the Data
Warehouse (DW) life cycle: the building of DW
from heterogeneous sources and the second part
is related to the maintenance phase.
1)
They belong to the same category of data
but they have different modellings;
2)
They belong to different data categories.
Thus, the integration of a relational database
(DB) and of an object-relational one is an example
of handling heterogeneous data sources. It is the
same case for a relational and an XML DBs.
A DW results from data sources integration.
It is a subject-oriented, integrated, time-variant,
Figure 1. The heterogeneity of DW's sources
Search WWH ::




Custom Search