Reverting the Effects of XQuery Update Expressions - Advances in Databases

Database Reference

In-Depth Information

restored between two encountered nodes, ensuring that the original document nodes

are restored in their original position. Therefore, since the three properties are met by

Algorithm 2, we are guaranteed that the original document is produced.

Proposition 5 (Complexity). Let Δ ↔ be a completed PUL of size n applicable on a

document D ,let D

O ( Δ ↔ ,D )

composed of d nodes, and let i and

r be the number of nodes inserted and removed by Δ ↔ , respectively. The complexity of

Algorithm 2 is

be a document in

O ( n log n + d + i + r )

Proof Sketch. The identification of the nodes to be removed and the original name

and value of updated nodes can rely on a hash-table, which requires

O ( n )

for its ini-

tialization and

for retrievals. For what concerns the restoration of nodes, a more

efficient representation of R is a list ordered according to the pre-order traversal of the

subtrees roots, with cost

O (1)

O ( n log n )

. Moreover, since events are generated according to

a pre-order visit of the tree, identifying whether there is a subtree to restore, requires to

check only the first element in the list, with cost

, assuming that subtrees removed

multiple times are pruned during the inverse application. Assessing whether the parent

of a node has been removed requires constant time, provided that this information is

stored in a state variable. Thus, since all nodes in the updated document need to be

processed and up to r nodes need to be restored, the complexity of the algorithm is

O ( n log n + d + i + r )

O (1)

Evaluation

In this section we discuss some experiments we have conducted by means of the ex-

tended Qizx library that is able to generate and process both PULs and completed

PULs represented as XML documents containing the serialization of each operation

along with the identifiers and labels of the target nodes. Our test machine uses an Intel

I5 760 processor, 16GB of RAM, and runs the Sun JDK v.1.6.20.

To assess the computational costs of the inversion operator and contrast the pecu-

liar advantages of PULs and completed PULs, we exploit documents of various sizes,

ranging from 16MB to 256MB, produced by means of the XMark data generator. Node

identifiers and labeling have been stored within the corresponding documents through

an encoding of their XDM structure. On such documents, synthetic XQU expressions

and their corresponding PULs/completed PULs have been generated with a varying

number of operations that are equally distributed among the operation types.

Starting from an XQU expression the generation of a PUL Δ /completed PUL Δ ↔

requires to load the whole document in memory before the actual evaluation of the

XQuery Update expression. When the expression is evaluated, generating a completed

PUL has the additional cost of retrieving and storing within the PUL itself all the mod-

ified values and removed nodes. Analogously, the generation of PUL inversion requires

to load the whole document in memory. The application of the so generated PULs, by

contrast, requires to load the whole PUL in memory and then update the corresponding

document through a single scan identifying a new document. While Δ and Δ − 1 seri-

alizations contain only the strictly required information for their application, the serial-

ization of Δ ↔ contains extra information that is discarded depending on the direction

(forward or backward) of application.

Advances in Databases

Search WWH ::

Custom Search

Home