Relational Techniques for Storing and Querying RDF Data - Advanced Database Query Systems

Database Reference

In-Depth Information

information about the characteristics of the objects in the RDF dataset. Such explicit information

cannot be always available and the process of inferring such implicit information introduces an

additional cost of a pre-processing phase. Such challenges call for new techniques for flexible

designs for the property tables encoding schemes.

CONCLUDING REMARKS

RDF is a main foundation for processing semantic information stored on the Web. It is the data model

behind the Semantic Web vision whose goal is to enable integration and sharing of data across different

applications and organizations. The naive way to store a set of RDF statements is using a relational da-

tabase with a single table including columns for subject, property, and object. While simple, this schema

quickly hits scalability limitations. Therefore, several approaches have been proposed to deal with this

limitation by using extensive set of indexes or by using selectivity estimation information to optimize

the join ordering (Neumann & Weikum, 2008; Weiss et al., 2008).

Another approach to reduce the self-join problem is to create separate tables (property tables) for

subjects that tend to have common properties defined (Chong et al., 2005; Levandoski & Mokbel, 2009).

Since Semantic Web data is often semi-structured, storing this data in a row-store can result in very

sparse tables as more subjects or properties are added. Hence, this normalization technique is typically

limited to resources that contain a similar set of properties and many small tables are usually created. The

problem is that this may result in union and join clauses in queries since information about a particular

subject may be located in many different property tables. This may complicate the plan generator and

query optimizer and can degrade performance.

Abadi et al. (2009) has explored the trade-off between triple-based stores and binary tables-based

stores of RDF data. The main advantages of binary tables are:

1. Improved bandwidth utilization : In a column store, only those attributes that are accessed by a

query need to be read off disk. In a row-store, surrounding attributes also need to be read since an

attribute is generally smaller than the smallest granularity in which data can be accessed.

2. Improved data compression : Storing data from the same attribute domain together increases

locality and thus data compression ratio. Hence, bandwidth requirements are further reduced when

transferring compressed data.

On the other side, binary tables have the following main disadvantages:

1. Increased cost of inserts : Column-stores perform poorly for insert queries since multiple distinct

locations on disk have to be updated for each inserted tuple (one for each attribute).

2. Increased tuple reconstruction costs : In order for column-stores to offer a standards-compliant

relational database interface (e.g., ODBC, JDBC, etc.), they must at some point in a query plan

stitch values from multiple columns together into a row-store style tuple to be output from the

database.

Abadi et al. (2009) reported that the performance of binary tables is superior to clustered property

table while Sidirourgos et al. (2008) reported that even in column-store database, the performance of

Search WWH ::

Custom Search

Home