Algebraic Optimization of RDF Graph Pattern Queries on MapReduce - Large Scale and Big Data: Processing and Management - page 214

Database Reference

In-Depth Information

(a)

2500

200

2000

150

1500

100

1000

50

500

0

0

dq 0 dq 1 dq 2 dq 3 dq 4

dq 0 dq 1 dq 2 dq 3 dq 4

SHARD

Pig-Def

NTGA

(b)

2500

Pig-Def

NTGA

2000

1500

58%

55%

1000

54.8%

52.8%

500

0

BSBM-250k

(22 GB)

BSBM-500k

(43 GB)

BSBM-750k

(66 GB)

BSBM-1000k

(86 GB)

FIGURE 6.15 (a) A comparative evaluation of the three approaches (SHARD, Pig-Def,

NTGA) for the queries with repeated properties. (b) Scalability study of scan-sharing

approach for query dq 4 with increasing sizes of RDF graphs.

I/O compared with other two approaches. Further, the execution time in SHARD

increases as the number of triple patterns increases from 8 to 12 in dq 0 to dq 4,

respectively. Pig-Def shows relatively better performance compared with SHARD

because the number of MR jobs for star join is mainly affected by the number of

star patterns in the queries. However, the amount of HDFS reads for all queries in

Pig-Def is still larger than the one in NTGA because the DupPs are scanned and pro-

cessed in BOTH the star-join cycles (MR1 and MR2 among 3 MR jobs) in Pig-Def,

which results in an increasing amount of HDFS bytes read as the number of DupPs

increase. In NTGA, it is observed that the execution time and the amount of HDFS

reads do not change much with varying numbers of DupPs because the grouping-

based star-join computation approach in NTGA enables a scan sharing for DupPs

while processing star subpatterns containing DupPs.

6.8.2.3 Varying Size of RDF Graphs

Figure 6.15b shows a comparative evaluation of the two approaches (Pig-Def and

NTGA) with increasing number of RDF triples. The NTGA approach scales well

with a performance gain of 52% to 58% ranging over BSBM-250k to BSBM-1000k

data sizes, respectively. The gain varies because the number of triples containing

repeated properties is not linearly increased when increasing the size of the data

sets.

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home