Database Reference
In-Depth Information
Updates on Grammar-Compressed XML Data
Alexander Bätz, Stefan Böttcher, and Rita Hartel
University of Paderborn, Computer Science, Fürstenallee 11, 33102 Paderborn, Germany
{laures,stb,rst}@uni-paderborn.de
Abstract. In this paper, we present updates on CluX, a grammar-based XML
compression approach based on clustering XML sub-trees. We show that
updates on CluX-compressed data can be performed faster than decompressing
the data, loading it into main memory and compressing it. Furthermore, we
show how to support fast multiple updates, e.g. performing 100 updates in
parallel is more than 70 times faster than 100 single updates.
Keywords: updating compressed XML data, grammar-based compression.
1 Introduction
Motivation: XML is widely used in business applications and is the de facto standard
for information exchange among different enterprise information systems. Whenever
the amount of processed XML data is a bottleneck, it is desirable that applications can
directly query and update compressed XML data without having to decompress the
data before accessing it.
There have been different contributions to the field of XML compressors
generating queryable XML representations, that range from encoding-based [1] to
schema-based [2], [3] to DAG-based [4] to grammar-based [5] compressed repre-
sentations. We follow the grammar-based XML compression techniques, and we
discuss how an XML compression technique, called CluX, can be extended by
updates. Like the big majority of the XML compression techniques (e.g.[1],[2],[3],
[5],[6],[7],[8],[9],[10],[11]), we assume that textual content of text nodes and of
attribute nodes is compressed and stored separately and focus here on the
compression of the structural part of an XML document.
Contributions: This paper proposes an approach to perform updates on grammar-
compressed XML data directly, i.e., without prior decompression of the compressed
data. Furthermore, our approach allows to perform several updates in parallel in such
a way that e.g. performing 100 updates in parallel is more than 70 times faster than
performing 100 updates sequentially.
We have implemented and evaluated updates on the compressed data. Our results
show that it is not only possible to perform parallel updates on the CluX compressed
data directly, but furthermore that in many cases, these updates can be performed in
less time than it would take to decompress the compressed data, load the XML
document, and compress the data again.
For simplicity of this presentation, we restrict it to XML documents containing
only element nodes, i.e., attributes are regarded as special element types. Note
 
Search WWH ::




Custom Search