Database Reference
In-Depth Information
Figure 1. XML data warehousing and analysis process
and store data. Therefore, new efforts are needed
to integrate XML in classical business applica-
tions. Integrating heterogeneous and complex
information in DSSs requires special consider-
ation. Existing ETL (Extract, Transform, Load)
tools that organize data into a common syntax are
indeed ill-adapted to complex data. Furthermore,
if XML documents must be prepared for future
OLAP analyses, storing them in a data repository
is not enough. Through these documents, a more
interesting abstraction level, completely oriented
toward analysis objectives, must be expressed.
It is thus necessary to structure XML data with
respect to a data warehouse multidimensional
reference model.
Though feeding data warehouses with XML
documents is getting increasingly common, meth-
odological issues arise. The multidimensional
organization of data warehouses is indeed quite
different from the semi-structured organization of
XML documents.A data warehouse architecture is
subject-oriented, integrated, consistent, and data
are regularly refreshed to represent temporal evo-
lutions. Then, how can multidimensional design
be carried out with a semi-structured formalism
such as XML?
XML may be characterized by two aspects.
On one hand, it helps store and exchange data
through XML documents. On the other hand,
XML Schemas are relevant for describing data.
Multidimensional modeling helps structure data
for query and analysis. An XML formalism can
thus be used to describe the various elements of a
multidimensional model (Boussaïd et al. , 2006).
But XML can only be considered as a logical and
physical description tool for future analysis tasks
on complex data. The reference conceptual model
remains the star schema and its derivatives.
One challenge we address in our approach is to
propose a multidimensional model (thus oriented
for analysis) that is described in XML, to derive
a physical organization of XML documents that
contributes to performance enhancement. To sup-
port this choice, we propose a modeling process
(Figure 1) that achieves complex data integration
(Boussaïd et al. , 2003; Boussaïd et al. , 2007;
Boussaïd et al. , 2008).
We first design a conceptual UML model
for a complex object. This UML model is then
directly translated into an XML Schema, which
we view as a logical model. At the physical level,
XML documents that are valid against this logical
model may be mapped into a relational, object-
relational or XML-native database. In this paper,
we focus on the latter family of DBMSs. After
representing complex data as XML documents,
we physically integrate them into an Operational
Data Store (ODS), which is a buffer ahead of the
actual warehouse.
At this stage, it is already possible to mine the
stored XML documents directly, e.g., with XML
structure mining techniques. In addition, to further
analyze these documents' contents efficiently, it is
interesting to warehouse them, i.e., devise a mul-
Search WWH ::




Custom Search