Database Reference
In-Depth Information
5.7 MAPSIN EVALUATION
The evaluation was performed on the same cluster used for the evaluation in Section
5.4, but we increased the RAM configuration of every server to 8 GB since HBase
consumes a lot of RAM. We used HBase in the version 0.90.4.
We used the well-known Lehigh University Benchmark (LUBM) [11] as the que-
ries can easily be formulated as SPARQL basic graph patterns. The generated data
sets ranged from 1000 up to 3000 universities using the WebPIE inference engine
for Hadoop [26] to precompute the transitive closure. The loading times for both
tables T s_po and T o_ps as well as all data sets are listed in Table 5.5. We illustrate the
performance comparison of PigSPARQL and MAPSIN for some selected LUBM
queries that represent the different query types in Figure 5.8. Our proof-of-concept
implementation is currently limited to a maximum number of two join variables
as the goal was to demonstrate the feasibility of the approach for selective queries
rather than supporting all possible BGP constellations. For detailed comparison, the
runtimes of all executed queries are listed in Table 5.6.
LUBM queries Q1, Q3, Q5, Q11, Q13 demonstrate the base case with a single join
between two triple patterns (cf. Figure 5.8a). MAPSIN joins performed 8 to 13 times
faster compared to the reduce-side joins of PigSPARQL. Furthermore, the perfor-
mance gain increases with the size of the data set.
LUBM queries Q4 (5 triple patterns), Q7 (4 triple patterns), Q8 (5 triple pat-
terns) demonstrate the more general case with a sequence of cascaded joins (cf.
FigureĀ  5.8b). In these cases, MAPSIN joins perform up to 28 times faster than
PigSPARQL. Of particular interest is query Q4 of LUBM, since it supports the
multiway join optimization outlined in Section 5.6.3, as all triple patterns share the
same join variable. This kind of optimization is also supported by PigSPARQL such
that both approaches can compute the query results with a single multiway join (cf.
Figure 5.8c). The MAPSIN multiway join optimization improves the basic MAPSIN
join execution time by a factor of 3.3 (LUBM Q4), independently of the data size.
Moreover, the MAPSIN multiway join optimization performs 19 to 28 times faster
than the reduce-side based multiway join implementation of PigSPARQL.
The remaining queries (LUBM Q6, Q14) consist of only one single triple pattern.
Consequently, they do not contain a join processing step and illustrate primarily the
advantages of the distributed HBase table scan compared with the HDFS storage
TABLE 5.5
LUBM Loading Times for Tables T s_po and T o_ps (hh:mm:ss)
LUBM
1000
1500
2000
2500
3000
# RDF triples
~210 million
~315 million
~420 million
~525 million
~630 million
T s_po
00:28:50
00:42:10
00:52:03
00:56:00
01:05:25
T o_ps
00:48:57
01:14:59
01:21:53
01:38:52
01:34:22
Total
01:17:47
01:57:09
02:13:56
02:34:52
02:39:47
 
Search WWH ::




Custom Search