Database Reference
In-Depth Information
xML-Based Design Approaches
summarizes data in XML documents to extract
only useful and valuable information in order to
create other XML document(s) used for the con-
struction of the dimensions. Thirdly, the method
creates intermediate XML documents from the
initial documents; this step focuses on determining
the main activity data (data involved in queries,
calculations etc.). Thus, each intermediate docu-
ment linked to other documents represents a fact
document. Finally, the method updates/links all
intermediate XML documents (fact and dimen-
sion documents), in such a way that relationships
between keys are established, and an XML DW
is created. In this method, several sub-steps have
to be accomplished manually by an expert in the
XML document domain.
Furthermore, (Jensen, Møller, & Pedersen,
2001) studied how an OLAP cube can be obtained
from XML data. To build a cube, the DTD of the
XML documents is transformed into a UML class
diagram using a set of transformation rules. Once
the class diagram is obtained, the designer uses
it to specify an OLAP DB model (named a UML
snowflake diagram) through a graphical user in-
terface. Finally, the UML snowflake diagram is
transformed into relational structures to prepare
the implementation of the OLAP cube. This ap-
proach is also used by (Ouaret, Z., Bellatreche,
L., and Boussaid, O., 2007) who starts from XML
schemas instead of the DTD.
In summary, we notice that:
(Golfarelli, Rizzi, & Vrdoljak, 2001) propose a
method for the design of an XML DW from XML
sources. Their method relies on two assumptions:
the existence of a DTD for the XML documents,
and the conformity of these documents to their
corresponding DTD. This method designs a DW
in three steps: 1) DTD simplification mainly to
flatten nested elements; 2) DTD graph creation
in order to represent graphically the source struc-
ture and simplify the manual fact selection; and
3) construction of an attribute tree for each fact
in the graph; within the attribute tree dimensions
and measures are found among the nodes imme-
diately linked to the chosen fact. In this method,
the selection of facts and measures is manual
and requires the intervention of an expert in the
domain of the XML documents that will load the
future data warehouse.
In an attempt to improve their method, (Vr-
doljak, Banek, & Rizzi, 2003) developed a semi-
automated process to design XML data warehouses
from XML schemas. Once again, in this process,
facts and measures are chosen manually. For
each selected fact, they 1) build the dependency
graph from the XML schema graph; 2) rearrange
the dependency graph to define dimensions and
measures; and then 3) create a logical schema. One
main drawback of this method is that it requires
an intensive intervention of the designer. In addi-
tion to manually identifying the fact, dimensions
and measures, the designer must also identify the
many-to-many relationships among elements;
these relationships are needed to construct the
dependency graph.
On the other hand, the authors of (Rusu,
Rahayu, & Taniar, 2004) and (Rusu, Rahayu, &
Taniar, 2005) also propose a generic method for
building an XML DW for XML documents. Their
method first applies a set of cleaning and integra-
tion operations in order to minimize the number
of occurrences of dirty data, XML structural er-
rors, duplications or inconsistencies. Secondly, it
1.
although data-driven approaches for multi-
dimensional design proceed automatically
from either E/R or UML, they are based on
conceptual models that companies either
do not always have, or detain obsolete
versions;
2.
the few proposed approaches for the design of
XML-based DM/DW suppose that the designer
is able to manually identify the interesting facts
to be analyzed. However, this identification
requires a high expertise both in OLAP domain
and, the XML document domain;
Search WWH ::




Custom Search