Designing Data Marts from XML and Relational Data Sources - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

xML-Based Design Approaches

summarizes data in XML documents to extract

only useful and valuable information in order to

create other XML document(s) used for the con-

struction of the dimensions. Thirdly, the method

creates intermediate XML documents from the

initial documents; this step focuses on determining

the main activity data (data involved in queries,

calculations etc.). Thus, each intermediate docu-

ment linked to other documents represents a fact

document. Finally, the method updates/links all

intermediate XML documents (fact and dimen-

sion documents), in such a way that relationships

between keys are established, and an XML DW

is created. In this method, several sub-steps have

to be accomplished manually by an expert in the

XML document domain.

Furthermore, (Jensen, Møller, & Pedersen,

2001) studied how an OLAP cube can be obtained

from XML data. To build a cube, the DTD of the

XML documents is transformed into a UML class

diagram using a set of transformation rules. Once

the class diagram is obtained, the designer uses

it to specify an OLAP DB model (named a UML

snowflake diagram) through a graphical user in-

terface. Finally, the UML snowflake diagram is

transformed into relational structures to prepare

the implementation of the OLAP cube. This ap-

proach is also used by (Ouaret, Z., Bellatreche,

L., and Boussaid, O., 2007) who starts from XML

schemas instead of the DTD.

In summary, we notice that:

(Golfarelli, Rizzi, & Vrdoljak, 2001) propose a

method for the design of an XML DW from XML

sources. Their method relies on two assumptions:

the existence of a DTD for the XML documents,

and the conformity of these documents to their

corresponding DTD. This method designs a DW

in three steps: 1) DTD simplification mainly to

flatten nested elements; 2) DTD graph creation

in order to represent graphically the source struc-

ture and simplify the manual fact selection; and

3) construction of an attribute tree for each fact

in the graph; within the attribute tree dimensions

and measures are found among the nodes imme-

diately linked to the chosen fact. In this method,

the selection of facts and measures is manual

and requires the intervention of an expert in the

domain of the XML documents that will load the

future data warehouse.

In an attempt to improve their method, (Vr-

doljak, Banek, & Rizzi, 2003) developed a semi-

automated process to design XML data warehouses

from XML schemas. Once again, in this process,

facts and measures are chosen manually. For

each selected fact, they 1) build the dependency

graph from the XML schema graph; 2) rearrange

the dependency graph to define dimensions and

measures; and then 3) create a logical schema. One

main drawback of this method is that it requires

an intensive intervention of the designer. In addi-

tion to manually identifying the fact, dimensions

and measures, the designer must also identify the

many-to-many relationships among elements;

these relationships are needed to construct the

dependency graph.

On the other hand, the authors of (Rusu,

Rahayu, & Taniar, 2004) and (Rusu, Rahayu, &

Taniar, 2005) also propose a generic method for

building an XML DW for XML documents. Their

method first applies a set of cleaning and integra-

tion operations in order to minimize the number

of occurrences of dirty data, XML structural er-

rors, duplications or inconsistencies. Secondly, it

1.

although data-driven approaches for multi-

dimensional design proceed automatically

from either E/R or UML, they are based on

conceptual models that companies either

do not always have, or detain obsolete

versions;

2.

the few proposed approaches for the design of

XML-based DM/DW suppose that the designer

is able to manually identify the interesting facts

to be analyzed. However, this identification

requires a high expertise both in OLAP domain

and, the XML document domain;

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home