Database Reference
In-Depth Information
Fig. 7. (a) Update time of CluX compared to compression and decompression times of CluX,
bzip2 and gzip, (b) Update time required for a scaling number of parallel updates
With scaling the document size (c.f. Fig. 7(a)), the direct updates on CluX can be
performed faster than the compression and decompression of CluX and bzip2. For a
document with a size of 15 MB, the update on the compressed data is 3.5 times faster
than the decompression and recompression by CluX and 4.4 times faster than the
decompression and recompression by bzip2. Only gzip, that reaches a far weaker
compression ratio than CluX can be decompressed and recompressed in less time than
the update process directly on the compressed data requires. Finally, we have examined
the impact of parallel updates compared to sequential updates. For this purpose, we
randomly selected 100 paths of the grammar DAG and relabeled the XML node defined
by these paths. Fig. 7(b) shows that performing 100 updates in parallel as a multi-update
operation is more than 70 times faster than performing 100 updates sequentially.
5 Related Work
Besides generic compressors like gzip, bzip2 or 7zip (based on LZMA) all of which
do not allow direct query evaluation on the compressed data, there are several
approaches to XML structure compression. XML structure compression can be
mainly divided into three categories: encoding-based compressors, schema-based
compressors and grammar-based compressors.
The encoding-based compressors allow for a faster compression speed than the
other ones, as only local data has to be considered in the compression as opposed to
considering different sub-trees as in grammar-based compressors. Examples for
encoding-based approaches are the approaches [13], [6], and [7], XMill [8], XPRESS
[9], XGrind [14], and [1]. Whereas XMill is not queryable, i.e., it does not support the
navigation or the evaluation of XPath queries on the compressed document directly,
i.e., without prior decompression, all other approaches are queryable.
Schema-based compression comprises such approaches as XCQ [2], XAUST [15],
Xenia [3], and XSDS [10]. They subtract the given schema information from the
structural information. Instead of a complete XML structure stream or tree, they only
Search WWH ::




Custom Search