Database Reference
In-Depth Information
respects the DTD, and if so, the XML document
is said to be valid .
Note that there exist other formalisms for
describing XML document structures, e.g. , XML
Schema and DCD (Document Content Descrip-
tion). However, the DTD is a formalism recom-
mended by the World Wide Web Consortium
(W3C) (W3C, 2008). For this, we assume in our
method that the structure of an XML document
is described by a DTD and that the XML source
documents are valid.
A DTD is composed of element types, sub-
element types, attributes, and terminal strings
such as ENTITY, PCDATA and CDATA. The
DTD types are however very limited since all of
the types are considered as strings. In addition, a
DTD can constrain the occurrences of an element
and a sub-element type through the symbols: “*”
(a set with zero or more elements), “+” (a set with
one or more elements), “?” (an optional element),
and “|” (alternative elements). For more details
about DTD, the reader is referred to (Sahuguet,
2000) and (W3C, 2008).
Figure 2 depicts an example of a DTD describ-
ing e-Ticket documents. An e-Ticket document
describes the booking in a hotel that a consumer
can do and/or the list of concerts that a consumer
can attend. Such documents can be used by an
online broker that deals with a particular hotel
and offers entertainment services (in this case,
concert ticket purchase).
there exists an attribute K such that t i ( K) ≠ t j ( K) _,
then such an attribute (set of attributes) is called
a candidate key . It is common to choose one of
the candidate keys as the primary key used to
uniquely identify tuples in the relation.
Furthermore, a set of attributes FK in a relation
R 1 is a foreign key if the two following rules are
satisfied: (1) the attributes in FK have the same
domain as the primary key PK of another relation
R 2 , and (2) every value of FK in any tuple t 1 in
R 1 either occurs as a value of PK for some tuple
t 2 in R 2 or it is null.
Figure 3 shows an example of a relational
database modeling a hotel room booking system.
This example is adapted from the one presented
in (Databasedev.co.uk, 2008). In our schema, the
primary keys are underlined and the foreign keys
are followed by the sharp sign (#) and the name
of the referenced relation.
DATA SOURCE PRETREATMENT
This first step of our design method aims at
structurally homogenizing the data sources by
transforming the source DTD into a set of relations
( i.e., tables). It is conducted through an automatic
process composed of four stages: DTD simplifica-
tion , transition tree construction , transition tree
enrichment and relational schema generation .
DTD Simplification
Basic Relational Model Concepts
The simplification of a DTD removes empty
elements, substitutes and transforms other ele-
ments. The empty element removal is applied
to every element that is tagged EMPTY and that
does not declare an ATTLIST. Such an element
has no content in valid XML documents. Thus,
it is not useful for the decision process. On the
other hand, the element substitution first replaces
each reference to an ENTITY type with the text
corresponding to that entity, and secondly it re-
moves the corresponding ENTITY declaration.
In the relational model (Codd, 1970), a database
is modeled by a set of relations (also called tables)
that forms a relational schema.A relation , denoted
R ( ( A 1 , A 2 ,…, A n _), has a name ( R ) and a list of
attributes ( A 1 , A 2 ,…, A n ) each of which is associ-
ated with one domain representing the set of its
possible values. Each attribute Ai is the name of
a role played by its domain D in the relation R .
Thus, a relation R represents a set of tuples t 1 , t 2 ,
… t m . If for any two distinct tuples t i and t j in R
Search WWH ::




Custom Search