Database Reference
In-Depth Information
Relational Schema Integration
set of all transition trees and operates according
to the following five steps:
At this stage, we have produced one relational
schema from the relational database and a second
one from the DTD/XML documents. However,
these two schemas represent one “virtually single”
data source used to load the DM. Thus, they
must be integrated to represent conceptually one,
coherent database. In other words, any semantic
heterogeneity must be resolved at this stage.
Quite existing works treat semantic heteroge-
neity in relational databases, cf. , (Bright, Hurson,
& Pakzad, 1994), (Sheth, & Larson, 1990), (Ceri,
Widom, 1993) , (Hull,1997) (Zhang, & Yang,
2008) and XML document storage as relational
databases, cf. , (Ceri, Fraternali, & Paraboschi,
2000), (Deutsch, Fernandez, Suciu, 1999), (Lee,
& Chu, 2000), (Schmidt, Kersten, Windhouwer,
& Waas, 2000), (Kappel, Kapsammer, & Retsch-
itzegger, 2001). In addition, selected schema
heterogeneity issues were treated in the context
of XML and relational database integration, cf. ,
(Kappel, Kapsammer, Retschitzegger, 2000).
These works, used in the context of database
model transformation, can be also used to resolve
the semantic heterogeneity within our design
method.
Overall, the works on relational DB schema
integration proceed in three steps: Pre-integration
where the different source models are transformed
into the same model, schema correspondence that
resolves naming conflicts; and schema fusion that
produces a global schema by replacing equivalent
schemas, regrouping intersecting schemas and
collecting independent schemas. In our method,
the pre-integration step is treated through our pre-
treatment step, which leaves us with the schema
correspondence and fusion steps.
To illustrate the integration step, let us revisit
our running example (Figures 3 and 7) which re-
quires only the fusion step. The integrated schema
of Figure 8 is produced by applying the following
fusion operations:
Step 1 finds the transition trees composed
of a single node.
Step 2 finds the set of nodes all of whose
children are leaves not annotated with #.
Step 3 transforms each transition tree re-
duced to its root r into a single-column ta-
ble with key r . This transformation avoids
the lost of degenerated dimensions and
ensures that dimensions can be potentially
built on these nodes.
Step 4 builds a table
R n for each node n all
of whose children are leaves not annotated
with #. The columns of R n are the children
of n. If a child a of n a leading arc labeled
with ID, then it becomes the primary key
of R n ; if no such a child exists, then an
artificial primary key is generated via the
ADD_ID function. Note that the tables is-
sued from this step may be referred to by
relations created in step 5. For our example,
this step produces the tables RoomTypes ,
RoomFacilities , County and Singer of the
Figure 7.
Step 5 deals with tables referencing other
tables. For each such table, this step creates
a table with a primary key either the node
annotated with ID, or the concatenation
of its foreign keys when their number ex-
ceeds one, otherwise an artificial attribute
is added as a primary key.
Figure 7 shows the relational schema generated
by applying the XML2R algorithm on the transi-
tion trees of the e-Ticket DTD. In this schema,
primary keys are underlined and, foreign keys
are followed by the sharp sign (#) and the name
of the referenced relation.
Search WWH ::




Custom Search