Database Reference
In-Depth Information
PATTERN-BASED APPROACH TO XML SCHEMAS AND INSTANCES
XML Schemas
Schemas for XML data are usually specified by means of XSDL ( XML Schema Definition Language )
(XML Schema, 2009) or DTD ( Document Type Definition ) (Martens et al., 2007). The aim of such a
schema is to control well-formedness and validity of XML documents. We assume that the structure of
XML documents conform to their DTDs and that attributes are represented by so called terminal ele-
ments labeled with terminal labels of Text type (we ignore mixed elements).
Let L be a set of labels, Ter L be a set of terminal labels, and top L - Ter be the outermost (top)
label.
Definition 1. ( DTD ) A tuple D = ( top , L , Ter , σ) is a DTD, if σ is a function assigning regular expres-
sions over L - { top } to non-terminal labels, and Text to terminal labels, i.e.
σ: L - Ter Reg
σ: Ter → { Text },
the set Reg of regular expressions is defined by the grammar:
e ::= l | ( e ) | e ? | e* | e + | e + e | e e .
Example 1. In Figure 1 there are graphical representations of the following DTDs ( D a is a recursive
DTD, while D 1 is non-recursive):
D 1 = ( pubs , { pubs, pub, title, year, author, name, university }, { title, year,
name, university }, σ),
where
σ( pubs ) = pub *, σ( pub ) = title year ? author +, σ( author ) = name university ?
σ( title ) = σ( year ) = σ( name ) = σ( university ) = Text .
D a = ( parts , { parts , part , pid }, { pid }, σ),
where:
σ( parts ) = part *, σ( part ) = pid part *, σ( pid ) = Text .
In this chapter, we assume that all XML documents are valid, so we will not be interested in checking
their validity against DTDs. Instead, we are interested in such a formal description of the structures of
XML documents that would be convenient for defining mappings and transforming data in XML data
integration processes.
Since every XML document is a finite tree, thus its structure can be specified by a tree pattern (Xu
& Ozsoyoglu, 2005). For documents having recursive DTDs their tree patterns are restricted to finite
depths. The notion of tree-patterns can be used to define tree-pattern formulas (Arenas & Libkin, 2005).
A tree-pattern formula arises from a tree pattern by assigning variables to terminal labels (i.e. to paths
starting in the root and ending with this terminal label). In Figure 1 the tree-pattern formulas, S 1 and S a ,
Search WWH ::




Custom Search