Database Reference
In-Depth Information
restored between two encountered nodes, ensuring that the original document nodes
are restored in their original position. Therefore, since the three properties are met by
Algorithm 2, we are guaranteed that the original document is produced.
Proposition 5 (Complexity). Let Δ be a completed PUL of size n applicable on a
document D ,let D
O ( Δ ,D )
composed of d nodes, and let i and
r be the number of nodes inserted and removed by Δ , respectively. The complexity of
Algorithm 2 is
be a document in
O ( n log n + d + i + r )
.
Proof Sketch. The identification of the nodes to be removed and the original name
and value of updated nodes can rely on a hash-table, which requires
O ( n )
for its ini-
tialization and
for retrievals. For what concerns the restoration of nodes, a more
efficient representation of R is a list ordered according to the pre-order traversal of the
subtrees roots, with cost
O (1)
O ( n log n )
. Moreover, since events are generated according to
a pre-order visit of the tree, identifying whether there is a subtree to restore, requires to
check only the first element in the list, with cost
, assuming that subtrees removed
multiple times are pruned during the inverse application. Assessing whether the parent
of a node has been removed requires constant time, provided that this information is
stored in a state variable. Thus, since all nodes in the updated document need to be
processed and up to r nodes need to be restored, the complexity of the algorithm is
O ( n log n + d + i + r )
O (1)
.
5
Evaluation
In this section we discuss some experiments we have conducted by means of the ex-
tended Qizx library that is able to generate and process both PULs and completed
PULs represented as XML documents containing the serialization of each operation
along with the identifiers and labels of the target nodes. Our test machine uses an Intel
I5 760 processor, 16GB of RAM, and runs the Sun JDK v.1.6.20.
To assess the computational costs of the inversion operator and contrast the pecu-
liar advantages of PULs and completed PULs, we exploit documents of various sizes,
ranging from 16MB to 256MB, produced by means of the XMark data generator. Node
identifiers and labeling have been stored within the corresponding documents through
an encoding of their XDM structure. On such documents, synthetic XQU expressions
and their corresponding PULs/completed PULs have been generated with a varying
number of operations that are equally distributed among the operation types.
Starting from an XQU expression the generation of a PUL Δ /completed PUL Δ
requires to load the whole document in memory before the actual evaluation of the
XQuery Update expression. When the expression is evaluated, generating a completed
PUL has the additional cost of retrieving and storing within the PUL itself all the mod-
ified values and removed nodes. Analogously, the generation of PUL inversion requires
to load the whole document in memory. The application of the so generated PULs, by
contrast, requires to load the whole PUL in memory and then update the corresponding
document through a single scan identifying a new document. While Δ and Δ 1 seri-
alizations contain only the strictly required information for their application, the serial-
ization of Δ contains extra information that is discarded depending on the direction
(forward or backward) of application.
Search WWH ::




Custom Search