Database Reference
In-Depth Information
pose a single, often relational data source; cf.,
(List, Bruckner, Machacze, & Schiefer, 2002),
(Golfarelli, Maio, & Rizz, 1998), (Cabibbo, L.,
& Torlone, R. 1998), (Moody , & Kortnik, 2000),
(Prat, Akoka , & Comyn-Wattiau, 2006), (Zribi,
& feki, 2007), (Golfarelli, Rizzi, & Vrdoljak,
2001), (Vrdoljak, Banek, & Rizzi, 2003), (Jensen,
Møller, & Pedersen, 2001). However, due to the
international competition, enterprises are increas-
ingly forced to enrich their own data repository
with data coming from external sources. Besides
data received from partners, the web constitutes
the main external data source for all enterprises.
For instance, an enterprise may need to retrieve
from the web data about the exchange rates in
order to analyze the variation of the quantities
of its sold products with respect to the exchange
rates during a period of time.
To deal with such an open data source, a DW/
DM construction approach must, hence, overcome
the main difficulty behind the use of multiple
data sources: the structural and semantic hetero-
geneities of the sources. In fact, even though the
relational data model is the most commonly used
model (Wikipedia encyclopedia, 2008), a DW
construction approach must now deal with other
data types and in particular XML documents which
represent the dominant data type on the web. On
the other hand, the semantic data heterogeneity
comes into play when the internal and external
data sources are complementary, e.g. , the case of
transactional data between partners. This type of
heterogeneity remains a challenging problem that
can be treated either at the data source level or the
DW/DM level (Boufares, & Hamdoun, 2005).
This chapter deals with the structural data
heterogeneity when designing a data mart. More
precisely, it presents a DM design method that
starts from both a relational database source and
XML documents compliant to a given DTD.
Besides considering these two types of data struc-
tures, our method has three additional advantages.
First, it provides for a DSS development centered
on decision makers: it assists them in defining
their analytical needs by proposing all analytical
subjects that could be automatically extracted
from their data sources; the automatic extraction
of DM schemas distinguishes our method from
currently proposed ones. Secondly, it guarantees
that the extracted subjects are loadable from the
enterprise information system and/or the external
data sources. The third advantage of our design
method is its genericity: It is domain independent
since it relies on the structural properties of the
data sources independently of their semantics.
It automatically applies a set of rules to extract,
from the relational database and XML docu-
ments, all possible facts with their dimensions
and hierarchies.
To achieve these advantages, our method oper-
ates in four steps. First, it structurally homogenizes
the two types of data sources by converting the
DTD into a relational model. Secondly, it classifies
the set of relations issued from both the converted
DTD and the repository of the source relational
DBMS. This classification is then used to identify
automatically the facts, measures, dimensions
and their attributes organized into hierarchies;
these identified multidimensional elements are
modeled as star DM schemas. Finally, the result-
ing DM schemas can be manually adapted by
the decision makers/designers to specify their
particular analytical needs. The automatic steps
of our design method allowed us to incorporate
it into a CASE toolset that interactively provides
for DM schema adaptation.
The remainder of this chapter is organized as
follows. First we overview current DW design
approaches for relational and XML data sources.
Then, we illustrate our four-step DM design
method through a relational data source and a
set of XML documents extracted from the web.
Finally, we summarize the presented work and
outline our ongoing research efforts.
Search WWH ::




Custom Search