Designing Data Marts from XML and Relational Data Sources - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

Relational Schema Integration

set of all transition trees and operates according

to the following five steps:

At this stage, we have produced one relational

schema from the relational database and a second

one from the DTD/XML documents. However,

these two schemas represent one “virtually single”

data source used to load the DM. Thus, they

must be integrated to represent conceptually one,

coherent database. In other words, any semantic

heterogeneity must be resolved at this stage.

Quite existing works treat semantic heteroge-

neity in relational databases, cf. , (Bright, Hurson,

& Pakzad, 1994), (Sheth, & Larson, 1990), (Ceri,

Widom, 1993) , (Hull,1997) (Zhang, & Yang,

2008) and XML document storage as relational

databases, cf. , (Ceri, Fraternali, & Paraboschi,

2000), (Deutsch, Fernandez, Suciu, 1999), (Lee,

& Chu, 2000), (Schmidt, Kersten, Windhouwer,

& Waas, 2000), (Kappel, Kapsammer, & Retsch-

itzegger, 2001). In addition, selected schema

heterogeneity issues were treated in the context

of XML and relational database integration, cf. ,

(Kappel, Kapsammer, Retschitzegger, 2000).

These works, used in the context of database

model transformation, can be also used to resolve

the semantic heterogeneity within our design

method.

Overall, the works on relational DB schema

integration proceed in three steps: Pre-integration

where the different source models are transformed

into the same model, schema correspondence that

resolves naming conflicts; and schema fusion that

produces a global schema by replacing equivalent

schemas, regrouping intersecting schemas and

collecting independent schemas. In our method,

the pre-integration step is treated through our pre-

treatment step, which leaves us with the schema

correspondence and fusion steps.

To illustrate the integration step, let us revisit

our running example (Figures 3 and 7) which re-

quires only the fusion step. The integrated schema

of Figure 8 is produced by applying the following

fusion operations:

•

Step 1 finds the transition trees composed

of a single node.

Step 2 finds the set of nodes all of whose

•

children are leaves not annotated with #.

Step 3 transforms each transition tree re-

•

duced to its root r into a single-column ta-

ble with key r . This transformation avoids

the lost of degenerated dimensions and

ensures that dimensions can be potentially

built on these nodes.

Step 4 builds a table

•

R n for each node n all

of whose children are leaves not annotated

with #. The columns of R n are the children

of n. If a child a of n a leading arc labeled

with ID, then it becomes the primary key

of R n ; if no such a child exists, then an

artificial primary key is generated via the

ADD_ID function. Note that the tables is-

sued from this step may be referred to by

relations created in step 5. For our example,

this step produces the tables RoomTypes ,

RoomFacilities , County and Singer of the

Figure 7.

Step 5 deals with tables referencing other

•

tables. For each such table, this step creates

a table with a primary key either the node

annotated with ID, or the concatenation

of its foreign keys when their number ex-

ceeds one, otherwise an artificial attribute

is added as a primary key.

Figure 7 shows the relational schema generated

by applying the XML2R algorithm on the transi-

tion trees of the e-Ticket DTD. In this schema,

primary keys are underlined and, foreign keys

are followed by the sharp sign (#) and the name

of the referenced relation.

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home