Relational Techniques for Storing and Querying RDF Data - Advanced Database Query Systems

Database Reference

In-Depth Information

All reported numbers of the query performance metric are the average of five executions with the

highest and the lowest values removed. The rational behind this is that the first reading of each query

is always expensively inconsistent with the other readings. This is because the relational database uses

buffer pools as a caching mechanism. The initial period when the database spends its time loading

pages into the buffer pools is known as the warm up period. During this period the response time of the

database declines with respect to the normal response time. For all metrics: the lower the metric value,

the better the approach.

Experimental Results

Table 3 summarizes the loading times for shredding the different datasets into the alternative relational

representations. The RS scheme is the fastest due to the less required number of insert tuple operations.

Similarly, the TS requires less loading time than BS since the number of inserted tuples and updated

tables are smaller for each triple.

Table 4 summarizes the storage cost for the alternative relational representations. The RS scheme

represents the cheapest approach because of the normalized design and the absence of any data redun-

dancy. Due to the limited percentage of the sparsity in the DBLP dataset, the PS does not introduce any

additional cost in the storage space except a little overhead due to the redundancy of the object identi-

fication attributes in the decomposed property tables. The BS scheme represents the most expensive

approach due to the redundancy of the ID attributes for each binary table. It should be also noted that

the storage cost of TS and BS are affected by the additional sizes of their associated indexes.

Table 5 summarizes the query performance for the SP 2 Bench benchmark queries over the alternative

relational representations using the different sizes of the dataset. Remarks about the results of this ex-

periment are given as follows:

1. There is no clear winner between the triple store ( TS ) and the binary table ( BS ) encoding schemes.

Triple store ( TS ) with its simple storage and the huge number of tuples in the encoding relation is

still very competitive to the binary tables encoding scheme because of the full set of B-tree physical

indexes over the permutations of the three encoding fields (subject, predicate, object).

2. The query performance of the ( BS ) encoding scheme is affected badly by the increase of the number

of the predicates in the input query. It is also affected by the subject-object or object-object type

of joins where no index information is available for utilization. Such problem could be solved by

building materialized views over the columns of the most frequently referenced pairs of attributes.

Table 3. A comparison between the alternative relational RDF storage techniques in terms of their

loading times

Loading Time (in seconds)

Dataset

Triple Stores

Binary Tables

Traditional Relational

Property Tables

500K

282

306

212

252

1M

577

586

402

521

2M

1242

1393

931

1176

$M

2881

2936

1845

2406

Advanced Database Query Systems

Search WWH ::

Custom Search

Home