Database Reference
In-Depth Information
TABLE 5.6
Query Execution Times for PigSPARQL (P) and MAPSIN (M) (in seconds)
1000
1500
2000
2500
3000
LUBM
P
M
P
M
P
M
P
M
P
M
Q1
324
34
475
51
634
53
790
70
944
84
Q3
324
33
480
42
642
49
805
59
961
72
Q4
1202
121
1758
167
2368
182
2919
235
3496
279
Q4 MJ
861
37
1297
53
1728
62
2173
81
2613
92
Q5
329
33
484
44
640
53
800
66
955
80
Q6
149
48
214
60
284
69
355
84
424
104
Q7
1013
62
1480
68
1985
93
2472
114
2928
123
Q8
1172
64
1731
77
2318
33
2870
108
3431
121
Q11
319
33
469
46
620
53
780
69
931
79
Q13
325
44
482
72
645
84
800
108
957
128
Q14
149
43
214
70
288
79
364
89
434
107
aspect of distributed systems, it is crucial to examine additional measures for future
optimizations.
Overall, the MAPSIN join approach clearly outperforms the reduce-side join
based query execution for selective queries. Both approaches reveal a linear scal-
ing behavior with the input size but the slope of the MAPSIN join is much smaller.
Especially for LUBM queries, MAPSIN joins outperform reduce-side joins by an
order of magnitude, as these queries are generally rather selective. Moreover, the
application of the multiway join optimization results in a further significant improve-
ment of the total query execution times.
5.8 RELATED WORK
Single machine RDF systems like Sesame [27] and Jena [28] are widely used since
they are user-friendly and perform well for small- and medium-sized RDF data sets.
RDF-3X [29] is considered one of the fastest single machine RDF systems in terms of
query performance that vastly outperforms previous single machine systems but per-
formance degrades for queries with unbound objects and low selectivity factor [30].
Furthermore, as the amount of RDF data continues to grow, it will become more and
more difficult to store entire data sets on a single machine due to the limited scaling
capabilities [3].
In [31], a translation from SPARQL to Pig Latin has already been mentioned.
However, the authors provide no further information or technical details about it. To
the best of our knowledge, we present the first detailed and comprehensive transla-
tion from SPARQL to Pig Latin that also considers efficient optimizations on dif-
ferent levels and is evaluated with a SPARQL performance benchmark that also
contains queries with the SPARQL-specific OPTIONAL operator.
The authors in [32] also consider the execution of SPARQL queries based on
Hadoop. In contrast to our approach a query is directly mapped into a sequence
 
Search WWH ::




Custom Search