TraCX: Transformation of Compressed XML - Advances in Databases

Database Reference

In-Depth Information

are created by navigating through the XML representation. Whenever a result of an

XPath expression is reached, the result is bound to the XQuery variable $v assigned to

this XPath expression in the transformation instruction, and the automata for all rela-

tive expressions that refer to the variable $v as context node are turned active. If we

consider for example the source document and the XQuery transformation given in

Fig. 1, we have the absolute XPath expression XPa=/cont/country, the results of

which are bound to the variable $country and the relative XPath expression

XPr=$country/city. So whenever whenever a result of XPa is found (e.g. in line (2) of

the source document), the automaton for XPr is started, and from that moment on, the

XML events are not only passed to the automaton of XPa, but also to the automaton

of XPr. Only when the end of the result is reached, i.e., the end-tag of the root of the

sub-tree that is the result of the query XPa is read (e.g. in line (5) of the source docu-

ment), XPr is stopped, and it does not receive any further XML events.

In order to avoid any unnecessary parsing and decompression, the automata send

feedback to the source, which navigation events are currently relevant, i.e., they report

the set of labels of outgoing transitions of all currently active states. If we consider

e.g. the absolute XPath expression /cont/country, and if the current node CN of any

given source document is at level 2 (i.e., CN is a grand-child of the document root),

navigating down to any level below via the first-child axis cannot lead to a result.

Therefore, the whole sub-tree having the parent CN can be skipped and does not need

to be decompressed.

This evaluation of absolute and relative XPath expressions describes a generic op-

timization for XQuery evaluation that can be applied to all XML representations sup-

porting the basic operations, but this generic optimization need not be applied. Note

that whenever an XML representation supports a more efficient XPath evaluation for

a given XML document, this evaluation can be used instead of the generic part.

Navigation in Uncompressed XML. In order to navigate via the binary axes within

an uncompressed XML document, we have to consider pairs of tags. Reading a

start-tag following another start-tag corresponds to a first-child event. Reading a start-

tag following an end-tag corresponds to a next-sibling event. Reading an end-tag

following another end-tag corresponds to an end-of-parent event. Reading an end-tag

following a start-tag does not generate an event at all. Finally, if the current tag is a

start-tag with label 'a', this corresponds to a self::a event.

To skip a sub-tree means 'ignoring' its tags, i.e. skipping as many tags, until the

number of the skipped start-tags is equal to the number of skipped end-tags.

Navigation in Succinct Compression. The Succinct Compression [6] stores the tree

structure of the XML document in a bit stream that contains a '1'-bit for each start-tag

and a '0'-bit for each end-tag within the document. The labels of the document tree

are stored in so called inverted element lists, that contain for each label L the list of

positions of '1'-bits within the bit stream that correspond to an element with label L.

For example, Fig. 3 shows the bit stream BS and the inverted element lists IES of

the source document of Fig. 1(a) and the bit stream BT and the inverted element lists

IET of the target document of Fig. 1(b). The data structure P is the index of positions

within BS and BT to which the entries in the inverted element lists refer. The inverted

element list with label “=T” represents positions of text values within the bit stream,

and each text node is represented as a bit sequence '10' in the bit stream.

Advances in Databases

Search WWH ::

Custom Search

Home