TraCX: Transformation of Compressed XML - Advances in Databases

Database Reference

In-Depth Information

In order to illustrate the process of copying a sub-tree from the source Succinct re-

presentation into the target Succinct representation, we assume that we want to copy

the sub-tree BS' of the source document that is represented by the positions 18 to 21

of the bit stream BS given in Fig. 3. The affected bits and inverted element list entries

are emphasized by bold letters. Furthermore, we assume that we want to append them

to the target document that up to now consists of the positions 0 to 11 of the bit

stream BT of the target document of Fig. 3, i.e., we want to insert the copied sub-tree

at position k=12.

Having copied the bits from position n (i.e. 18) to position m (i.e. 21) of BS to BT,

we run through the inverted element list and copy all positions x with n≤x≤m (i.e.

position 18 of the list “city” and position 19 of the list “=T”). As we only append new

nodes to the current end of the target document, we do not have to re-compute index

positions within the inverted element lists of nodes that have been previously added to

the target document. In order to insert the copied sub-tree at position k (i.e. 12), we

add k-n to each copied position (i.e., we add 12-18=-6 to the positions 18 and 19 and

yield the new positions 12 and 13). Then, we can insert the list of positions of the

copied sub-tree into the list of positions of the target document for each label (i.e., we

add positions 12 to the list “city” of BT and position 13 to the list “=T” of BT).

Even if we have to copy more than one sub-tree, we can copy the relevant parts of

the bit stream and the inverted element lists within a single pass. We first sort the

positions of the sub-trees to be copied in ascending order, process the bit stream and

copy all relevant bits, and determine the start and end positions of the copied sub-

trees. Then, we can run a single pass through the inverted element lists in which we

adapt the positions and insert the positions that were copied.

3 Evaluation of Our Prototype Implementation

To evaluate the performance of our Java prototype implementation, we conducted

several tests. We ran the tests on an Intel Core i5 450M with 2.4 GHz and 4 GB

memory using Linux and JRE 1.6.0. The tests compare our prototype implementation

to version 1.4.0 of the XQuery evaluator Zorba (http://www.zorba-xquery.com/),

started by zorba -f -q queryFile -o outputFile.

For each test, we compared five methods of transformation. Three methods use our

prototype implementation: (1) transforming uncompressed data, (2) indirectly trans-

forming data compressed via Succinct, i.e., first decompression, then uncompressed

transformation, and, finally, compression of the transformation's result and (3) direct

transformation of data compressed via Succinct. Two methods use Zorba: (4) trans-

forming uncompressed data and (5) indirectly transforming data compressed via Suc-

cinct as explained above. With Zorba, it is not possible to transform compressed data

directly.

In the first test, we use an XML source file of variable size consisting of the root

node <Country > that itself consists of a single <Name> node and of N <City > nodes.

Based on that XML source file, we evaluate an XQuery that selects the root node

<Country > . Afterwards, using the root node as context node, it evaluates the two

relative XPath expressions $country/Name and $country/City . For each <City > node,

the selected nodes are copied to the transformation's target document.

Advances in Databases

Search WWH ::

Custom Search

Home