Database Reference
In-Depth Information
Schema languages for XML
Relational schemas essentially specify a list of attributes (columns) for every relation. In
the case of XML one also needs to specify a set of attributes for each label. This, however,
is insufficient, as schemas must describe the hierarchical structure of XML documents too.
To see how this can be done, let us revisit again the example in Figure 10.1 . If we look at
the root and read the labels of its children, from left to right, then we get a string country
country . If we look at the Scotland node and read the labels of its children, then we see a
string ruler ruler ruler ruler . It is a common approach to describe schemas in terms
of the allowed strings obtained in this way. The simplest such schemas are Document
Type Definitions ,or DTDs . They put restrictions on strings of labels in terms of regular
languages they belong to.
For example, the tree from Figure 10.1 conforms to the DTD below:
country
europe
country :@ name
ruler
country
ruler :@ name
The first column restricts regular expressions of strings of labels. It says that labels of
children of a europe node must come from the language country , i.e., be a sequence
of country labels, and that labels of children of a country node must come from the
language ruler . The right column gives a list of attributes: in this case, the attribute
@name is defined for country and ruler nodes. Formally, DTDs are defined as follows.
Definition 10.4 (DTDs) A DTD D over an alphabet
with a distinguished symbol r (for
the root) and a set of attributes Att is a triple ( P D , A D , r ) where:
Γ
P D is a mapping from
Γ
to regular expressions over
Γ −{ r }
, which one typically writes
as productions
e if P D ( )= e .
2 Att
A D is a mapping A D :
Γ
that assigns a (possibly empty) set of attributes to each
element type.
The size of D , written as
D
, is the total length of all used regular expressions, plus
|
Att
|
.
Leaf nodes do not have any children; for instance, ruler nodes are such. Technically,
it means that they have an empty string of children. For them, DTDs will have rules like
ruler ε
; however, we usually omit them when we describe DTDs. So, if there is no
rule of the form
.
We always assume, for notational convenience, that attributes of a node (if there are
several of them) come in some order. This is the same as in the relational case: attributes
in tuples come in some order so we can write R ( a 1 ,..., a n ). Likewise, we shall describe an
-labeled tree node with n attributes as ( a 1 ,..., a n ).
Atree T conforms to a DTD D =( P D , A D , r ), written as T
e in the description of a DTD, we assume that
ε
|
= D ,if:
1. its root is labeled r ;
2. for each node labeled , the labels of its children, read left-to-right, form a string in the
language of P D ( );and
Search WWH ::




Custom Search