Updates on Grammar-Compressed XML Data - Advances in Databases

Database Reference

In-Depth Information

Updates on Grammar-Compressed XML Data

Alexander Bätz, Stefan Böttcher, and Rita Hartel

University of Paderborn, Computer Science, Fürstenallee 11, 33102 Paderborn, Germany

{laures,stb,rst}@uni-paderborn.de

Abstract. In this paper, we present updates on CluX, a grammar-based XML

compression approach based on clustering XML sub-trees. We show that

updates on CluX-compressed data can be performed faster than decompressing

the data, loading it into main memory and compressing it. Furthermore, we

show how to support fast multiple updates, e.g. performing 100 updates in

parallel is more than 70 times faster than 100 single updates.

Keywords: updating compressed XML data, grammar-based compression.

1 Introduction

Motivation: XML is widely used in business applications and is the de facto standard

for information exchange among different enterprise information systems. Whenever

the amount of processed XML data is a bottleneck, it is desirable that applications can

directly query and update compressed XML data without having to decompress the

data before accessing it.

There have been different contributions to the field of XML compressors

generating queryable XML representations, that range from encoding-based [1] to

schema-based [2], [3] to DAG-based [4] to grammar-based [5] compressed repre-

sentations. We follow the grammar-based XML compression techniques, and we

discuss how an XML compression technique, called CluX, can be extended by

updates. Like the big majority of the XML compression techniques (e.g.[1],[2],[3],

[5],[6],[7],[8],[9],[10],[11]), we assume that textual content of text nodes and of

attribute nodes is compressed and stored separately and focus here on the

compression of the structural part of an XML document.

Contributions: This paper proposes an approach to perform updates on grammar-

compressed XML data directly, i.e., without prior decompression of the compressed

data. Furthermore, our approach allows to perform several updates in parallel in such

a way that e.g. performing 100 updates in parallel is more than 70 times faster than

performing 100 updates sequentially.

We have implemented and evaluated updates on the compressed data. Our results

show that it is not only possible to perform parallel updates on the CluX compressed

data directly, but furthermore that in many cases, these updates can be performed in

less time than it would take to decompress the compressed data, load the XML

document, and compress the data again.

For simplicity of this presentation, we restrict it to XML documents containing

only element nodes, i.e., attributes are regarded as special element types. Note

Search WWH ::

Custom Search

Home