Computing Compressed XML Data from Relational Databases - Advances in Databases

Database Reference

In-Depth Information

existence of which can be deduced from an XML schema, we do not even generate

these XML nodes of an SQL/XML query that would be removed later by XML com-

pression. In XSDS [1], these non-generated nodes are the XML nodes predefined by

an XML schema definition, whereas here, these non-generated nodes are the XML

nodes generated by the SQL/XML query. Omitting these nodes will significantly

reduce the overhead of the XML data generated by the SQL/XML query from the

relational data. Note that the compressed XML document can be decompressed to

XML on demand, and XPath queries can be answered directly on the compressed

document by the XSDS decompressor described in [1].

The result of an SQL/XML query is a single XML document consisting of the

XML structure (i.e. the element tags and attributes) on the one hand and the text data

(i.e. the text and attribute values) on the other hand. We store all text values in docu-

ment order in a single text container that is compressed via gzip.

Concerning the XML document structure, we store the 'fixed' part of the structure,

i.e., that part that can be derived from the query without any knowledge on the data,

within an XML schema. For the example given in Table 2, we know e.g. that each

element with label 'nation' has an attribute with attribute name 'name' and it contains

any number of elements with label 'customer'. If the SQL/XML query is known at the

receiver's side, the schema could even be generated there, i.e., there is no need to

transfer the schema. In addition to the fixed part of the document that is stored in the

XML schema, we need to store the variant parts of the XML document. Whenever

there is a nested query or whenever there is a call to the function XMLAGG, the

number of elements to be created depends on the data stored in the relational data-

base, i.e., the number varies. Therefore, we have to store the number of occurrences in

our compressed data format for the elements generated by a nested query or by the

function XMLAGG.

We use the following two steps to generate the compressed XML data as a result

for an SQL/XML query:

In the first step, we analyze the SQL/XML query to compute a set of templates that

are repeated substructures within the compressed document's structure. Therefore, we

compute the set of templates in the form of an XML schema of the result document

based on the SQL/XML query alone.

In the second step, we query the relational data to examine, how the templates that

were generated in the first step have to be combined to form the complete structure of

the document. We do this by constructing an SQL query that retrieves the text values,

and from which the compressed document structure can be computed. At the same time,

we use the results of our query to compute the text values of the result document.

The output of these two steps is an XML schema on the one hand, and the com-

pressed XML document containing the document structure in compressed format and

the compressed text values on the other hand.

3 Retrieving Compressed XML Data from a Relational Database

3.1 Generating the XML Schema for the SQL/XML Query Result

In this first step, we analyze the SQL/XML query and compute the XML schema

according to which the resulting document will be valid. For the schema generation,

Advances in Databases

Search WWH ::

Custom Search

Home