Database Reference
In-Depth Information
10
The problem of XML data exchange
In this chapter we shall study data exchange for XML documents. XML itself was invented
as a standard for data exchange on the Web, albeit under a different interpretation of the
term “data exchange”. In the Web context, it typically refers to a common, flexible format
that everyone agrees on, and that, therefore, facilitates the transfer of data between different
sites and applications. When we speak of data exchange, we mean transforming databases
under different schemas with respect to schema mapping rules, and querying the exchanged
data.
10.1 XML documents and schemas
In this section we review the basic definitions regarding XML. Note that a simple example
was already shown in
Chapter 1
. XML documents have a
hierarchical
structure, usually
abstracted as a tree. An example is shown in
Figure 10.1
. This document contains infor-
mation about rulers of European countries. Its structure is represented by a labeled tree; in
this example, the labels are
europe
,
country
,and
ruler
. In the XML context, these are
referred to as
element types
. We assume that the labels come from a finite labeling alphabet
and correspond, roughly, to relation names from the classical relational setting.
The root of the tree is labeled
europe
, and it has two children that are labeled
country
.
These have
data values
, given in parentheses: the first one is
Scotland
, and the second
one is
England
. Each country in turn has a set of rulers. That is, the children of each
country
node are labeled
ruler
, and have associated data values assigned to them, for
example,
James V
. These data values come from a potentially infinite set (e.g., of strings,
or numbers). We also assume that, in general, children of each node are
ordered
; normally
this order is interpreted as going from left to right in the picture. That is,
James V
is the
first child of the
Scotland
node, and
Charles I
is the last. In our example, this corresponds
to the chronological order.
In general, a node may have more than one data value. We assume, under the analogy
between node labels and relation names, that each node has some
attributes
that store data
values associated with it.