Large-Scale RDF Processing with MapReduce - Large Scale and Big Data: Processing and Management - page 168

Database Reference

In-Depth Information

TABLE 5.2

T s_po Table for RDF Graph in Figure 5.6

Rowkey

Family:Column → Value

Article1

p:title→{“PigSPARQL”},

p:year→{“2011”},

p:author→{Alex, Martin}

Article2

p:title→{“RDFPath”},

p:year→{“2011”},

p:author→{Martin, Alex},

p:cite→{Article1}

TABLE 5.3

T o_ps Table for RDF Graph in Figure 5.6

Rowkey

Family:Column → Value

“2011”

p:year→{Article1, Article2}

“PigSPARQL”

p:title→{Article1}

“RDFPath”

p:title→{Article2}

Alex

p:author→{Article1, Article2}

Article1

p:cite→{Article2}

Martin

p:author→{Article2, Article1}

"PigSPARQL"

SPARQL BGP query

SELECT *

WHERE {

?article title ?title

?article author ?author

?article year ?year

}

title

Article1

author

Alex

year

"RDFPath"

author

cite

author

"2011"

title

Martin

Article2

author

year

"2011"

FIGURE 5.6

RDF graph and SPARQL query.

side such that no unnecessary data must be transferred over the network ( predi-

cate push-down ). As already mentioned in [25], a table with predicates as row keys

causes scalability problems since the number of predicates in an ontology is usually

fixed and relatively small, which results in a table with just a few very fat rows.

Considering that all data in a row is stored on the same machine, the resources of a

single machine in the cluster become a bottleneck. Indeed, if only the predicate in a

triple pattern is given, we can use the HBase Filter API to answer this request with

a table scan on T s_po or T o_ps using the predicate as column filter. Table 5.4 shows the

mapping of every possible triple pattern to the corresponding HBase table. Overall,

experiments on our cluster showed that the two-table schema with server side filters

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home