Updates on Grammar-Compressed XML Data - Advances in Databases

Database Reference

In-Depth Information

Fig. 7. (a) Update time of CluX compared to compression and decompression times of CluX,

bzip2 and gzip, (b) Update time required for a scaling number of parallel updates

With scaling the document size (c.f. Fig. 7(a)), the direct updates on CluX can be

performed faster than the compression and decompression of CluX and bzip2. For a

document with a size of 15 MB, the update on the compressed data is 3.5 times faster

than the decompression and recompression by CluX and 4.4 times faster than the

decompression and recompression by bzip2. Only gzip, that reaches a far weaker

compression ratio than CluX can be decompressed and recompressed in less time than

the update process directly on the compressed data requires. Finally, we have examined

the impact of parallel updates compared to sequential updates. For this purpose, we

randomly selected 100 paths of the grammar DAG and relabeled the XML node defined

by these paths. Fig. 7(b) shows that performing 100 updates in parallel as a multi-update

operation is more than 70 times faster than performing 100 updates sequentially.

5 Related Work

Besides generic compressors like gzip, bzip2 or 7zip (based on LZMA) all of which

do not allow direct query evaluation on the compressed data, there are several

approaches to XML structure compression. XML structure compression can be

mainly divided into three categories: encoding-based compressors, schema-based

compressors and grammar-based compressors.

The encoding-based compressors allow for a faster compression speed than the

other ones, as only local data has to be considered in the compression as opposed to

considering different sub-trees as in grammar-based compressors. Examples for

encoding-based approaches are the approaches [13], [6], and [7], XMill [8], XPRESS

[9], XGrind [14], and [1]. Whereas XMill is not queryable, i.e., it does not support the

navigation or the evaluation of XPath queries on the compressed document directly,

i.e., without prior decompression, all other approaches are queryable.

Schema-based compression comprises such approaches as XCQ [2], XAUST [15],

Xenia [3], and XSDS [10]. They subtract the given schema information from the

structural information. Instead of a complete XML structure stream or tree, they only

Search WWH ::

Custom Search

Home