A Cloud-Based, Geospatial Linked Data Management System - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

Yars2 [ 7 ] is a federated semantic search engine for performing interactive

query answering over heterogeneous LD collected from many disparate Web

sources. The local indexing scheme adopted comprises: (a) keyword indices based

on Apache Lucene to enable keyword lookups, (b) full quad sparse indices, and

(c) join indices to speed up queries. For global-based indexing, three partitioning

methods are employed to decide on the node where a particular quad will be

indexed.

Mika and Tummarello [ 14 ] have produced a research prototype in the form

of a back-end for the Sesame Triple Store which exploits Pig to load and query

RDF data, where RDF loading is performed by converting RDF to Pig's data

model.

Tanimura et al. [ 22 ] have implemented a scalable RDF data processing frame-

work which exploits parallel database processing over the Google File System

(GFS). Hadoop is used as the basic infrastructure based on GFS and MapReduce

while Pig is used as the data processing platform. For ecient RDF querying,

a particular RDF storage scheme which combines vertical partitioning with the

Hadoops key-value data format was adopted.

Husain et al. [ 9 ] have developed a scalable and fault-tolerant framework which

exploits a particular scheme for storing RDF Data in the Hadoop File System

and supports data intensive query processing.

A RDF storage and querying prototype system has been implemented in [ 21 ]

based on MapReduce and HBase. The realized storage scheme employs six HBase

tables to cover all RDF triple pattern combinations, while triples are indexed

through the HBase index structure on row key.

A distributed RDF prototype store is presented in [ 17 ] based on MapReduce

and HBase. The storage scheme employs three indices to cover particular triple

pattern combinations stored in HBase tables in the form of key-value pairs.

Franke et al. [ 4 ] have implemented a prototype with two different distributed

RDF storage schemes based on HBase and MySQL Cluster, respectively. The

HBase database schema relies on creating two tables for storing RDF triples,

while the MySQL-based scheme relies on a simple table which has as columns

the triple subjects, predicates and objects, respectively.

An extension of the RAPID prototype system is proposed in [ 18 ] which relies

on Pig and Hadoop and exploits PigLatin as the high-level language to support

ad-hoc processing and querying over large data-sets.

An RDF molecule-based store has been realized in [ 16 ] by exploiting Hadoop

to scale-out the distributed query processing. A number of extensions with

respect to molecule hierarchy and structure are proposed to the initial mole-

cule definition to resolve particular query performance issues.

Proprietary Approaches. Dydra 6 is a multi-tenant, cloud-based graph data-

base deployed on the Amazon Cloud, which exhibits various features, such as

versioning and disaster recovery. RDF data are stored as a property graph which

directly represents the relationships between them.

6 www.dydra.com .

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home