Database Reference
In-Depth Information
information about the characteristics of the objects in the RDF dataset. Such explicit information
cannot be always available and the process of inferring such implicit information introduces an
additional cost of a pre-processing phase. Such challenges call for new techniques for flexible
designs for the property tables encoding schemes.
CONCLUDING REMARKS
RDF is a main foundation for processing semantic information stored on the Web. It is the data model
behind the Semantic Web vision whose goal is to enable integration and sharing of data across different
applications and organizations. The naive way to store a set of RDF statements is using a relational da-
tabase with a single table including columns for subject, property, and object. While simple, this schema
quickly hits scalability limitations. Therefore, several approaches have been proposed to deal with this
limitation by using extensive set of indexes or by using selectivity estimation information to optimize
the join ordering (Neumann & Weikum, 2008; Weiss et al., 2008).
Another approach to reduce the self-join problem is to create separate tables (property tables) for
subjects that tend to have common properties defined (Chong et al., 2005; Levandoski & Mokbel, 2009).
Since Semantic Web data is often semi-structured, storing this data in a row-store can result in very
sparse tables as more subjects or properties are added. Hence, this normalization technique is typically
limited to resources that contain a similar set of properties and many small tables are usually created. The
problem is that this may result in union and join clauses in queries since information about a particular
subject may be located in many different property tables. This may complicate the plan generator and
query optimizer and can degrade performance.
Abadi et al. (2009) has explored the trade-off between triple-based stores and binary tables-based
stores of RDF data. The main advantages of binary tables are:
1. Improved bandwidth utilization : In a column store, only those attributes that are accessed by a
query need to be read off disk. In a row-store, surrounding attributes also need to be read since an
attribute is generally smaller than the smallest granularity in which data can be accessed.
2. Improved data compression : Storing data from the same attribute domain together increases
locality and thus data compression ratio. Hence, bandwidth requirements are further reduced when
transferring compressed data.
On the other side, binary tables have the following main disadvantages:
1. Increased cost of inserts : Column-stores perform poorly for insert queries since multiple distinct
locations on disk have to be updated for each inserted tuple (one for each attribute).
2. Increased tuple reconstruction costs : In order for column-stores to offer a standards-compliant
relational database interface (e.g., ODBC, JDBC, etc.), they must at some point in a query plan
stitch values from multiple columns together into a row-store style tuple to be output from the
database.
Abadi et al. (2009) reported that the performance of binary tables is superior to clustered property
table while Sidirourgos et al. (2008) reported that even in column-store database, the performance of
Search WWH ::




Custom Search