Large-Scale RDF Processing with MapReduce - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

TABLE 5.6

Query Execution Times for PigSPARQL (P) and MAPSIN (M) (in seconds)

1000

1500

2000

2500

3000

LUBM

324

475

634

790

944

324

480

642

805

961

1202

121

1758

167

2368

182

2919

235

3496

279

Q4 MJ

861

1297

1728

2173

2613

329

484

640

800

955

149

214

284

355

424

104

1013

1480

1985

2472

114

2928

123

1172

1731

2318

2870

108

3431

121

Q11

319

469

620

780

931

Q13

325

482

645

800

108

957

128

Q14

149

214

288

364

434

107

aspect of distributed systems, it is crucial to examine additional measures for future

optimizations.

Overall, the MAPSIN join approach clearly outperforms the reduce-side join

based query execution for selective queries. Both approaches reveal a linear scal-

ing behavior with the input size but the slope of the MAPSIN join is much smaller.

Especially for LUBM queries, MAPSIN joins outperform reduce-side joins by an

order of magnitude, as these queries are generally rather selective. Moreover, the

application of the multiway join optimization results in a further significant improve-

ment of the total query execution times.

5.8 RELATED WORK

Single machine RDF systems like Sesame [27] and Jena [28] are widely used since

they are user-friendly and perform well for small- and medium-sized RDF data sets.

RDF-3X [29] is considered one of the fastest single machine RDF systems in terms of

query performance that vastly outperforms previous single machine systems but per-

formance degrades for queries with unbound objects and low selectivity factor [30].

Furthermore, as the amount of RDF data continues to grow, it will become more and

more difficult to store entire data sets on a single machine due to the limited scaling

capabilities [3].

In [31], a translation from SPARQL to Pig Latin has already been mentioned.

However, the authors provide no further information or technical details about it. To

the best of our knowledge, we present the first detailed and comprehensive transla-

tion from SPARQL to Pig Latin that also considers efficient optimizations on dif-

ferent levels and is evaluated with a SPARQL performance benchmark that also

contains queries with the SPARQL-specific OPTIONAL operator.

The authors in [32] also consider the execution of SPARQL queries based on

Hadoop. In contrast to our approach a query is directly mapped into a sequence

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home