Algebraic Optimization of RDF Graph Pattern Queries on MapReduce - Large Scale and Big Data: Processing and Management - page 223

Database Reference

In-Depth Information

(a)

Varying multiplicity of MV property

Pig-Opt

3500

Hive

NTGA

3000

2500

2000

1500

1000

500

0

low-1Star

high-1Star

base-2Star ow-2Starhigh-2Star

(b)

Redundancy factor across the MapReduce workflow

Query

MR S 1

MR S 1 S 2

MR S 2

Low-1Star

0.72 (1.5 GB)

-

-

High-1Star

0.82 (5.2 GB)

-

-

Base-2Star

0 (0.2 GB)

0 (7.7 GB)

0 (12.7 GB)

Low-2Star

0.72 (1.6 GB)

0 (7.7 GB)

0.78 (75.8 GB)

High-2Star

0.82 (5.4 GB)

0 (7.7 GB)

0.89 (250 GB)

(c)

Varying density of star-joins

2500

Pig-Opt

Hive

NTGA-Opt

2000

1500

1000

500

0

MV-2p

MV-3p

MV-4p

V-5p

FIGURE 6.20 (a) Comparative evaluation using one and two star subpattern queries con-

taining low and high multiplicity MV property, (b) redundancy factor in reduce output while

evaluating test queries using flat algebra, (c) impact of lazy unnesting strategy with increasing

cardinality of star-joins (BSBM-500k, 10-node).

6.10 CONCLUDING REMARKS

This chapter discusses the challenges and strategies for RDF query processing

on MapReduce platforms. The impetus for this research direction is the range of

emerging applications that rely on increasing amounts of publicly available Semantic

Web data as background knowledge for analysis. In many scenarios, the computa-

tional needs required to incorporate such large amounts of Semantic Web data in

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home