Computing Compressed XML Data from Relational Databases - Advances in Databases

Database Reference

In-Depth Information

columns of tuples 1 to n-1 belonging to the output columns of Q2 (i.e., the values of

all tuples of the inner query). If we consider the sub-sequence SSQ consisting of the

tuples 7-9 of Table 4, we first write the value 'Spain' (i.e., the value of tuple 7 - the

first tuple - belonging to the output columns of Q1) followed by the values C#023 and

C#101 (i.e., the value of tuples 7 and 8, which are the 1..n-1 values for size n=3 of

SSQ consisting of the tuples 7-9).

Retrieving the Compressed Document Structure. Let Q1 and Q2 be two nested

sub-queries, where Q1 contains Q2. At the beginning of each sub-sequence of tuples

within the result set belonging to Q1, we store a counter in the structure stream that is

initialized with 0 and that is incremented, whenever a tuple with the Query_ID of Q1

is read. Similarly, at the beginning of each nested sub-sequence of tuples within the

result set belonging to Q2, we add a new counter cQ2 in the structure stream that is

initialized with 0 and that is incremented, whenever a tuple with Query_ID of Q2 is

read. The counter cQ2 is closed (i.e., no more incrementation is possible), whenever a

tuple with Query_ID of Q1 is read.

If we apply this process to the query result shown in Table 3 of the query given in

Fig. 2, we get exactly the compressed XML document as given in Fig. 3.

Remember that decompression back to XML and querying the compressed docu-

ment can be done by the XSDS decompressor described in [1].

4 Evaluation

We have evaluated our approach using the database systems Oracle 10g Express and

IBM DB2 Express. As both have shown similar results, we concentrate on the DB2

results within this evaluation section.

We have used the TCP-H benchmark (http://www.tpc.org/tpch/) to create a rela-

tional database. We have tested 5 different kinds of queries, that select customers

sorted by nation (CN4 and CN16), customer data (C400 and C3200), article data (A4

and A16), supplier data (S4 and S16) and order data including customer and supplier

information (O2 and O4). Each of these queries contain a range clause within the

where clause, such that the result size can be scaled.

For the evaluation of the compression ratio reached by XSDS, please refer to [1].

Fig. 4 shows the query evaluation times for our set of queries for the indirect ap-

proach (i.e., evaluating SQL/XML query and then compressing the result) on the one

hand and for the direct approach (generating compressed XML directly from the

SQL/XML query and the relational data) on the other hand in relation to the

SQL/XML query evaluation time (100%). We can see that our approach not only

takes less time to compute the compressed data directly than the total time of the

indirect approach, but that for all queries tested, it can even directly compute the

compressed data in less time than the SQL/XML query evaluation alone takes. Fur-

thermore, we can see that our approach scales better for larger result sets, as for each

pair of queries that carry the same initial letters, where the result size was scaled up

(e.g. S4 and the 4 times larger S16), we can see that the performance gain compared

to the query evaluation time is better, when the result gets larger.

Search WWH ::

Custom Search

Home