Database Reference
In-Depth Information
Algorithm 6.4: The Extended POTGGroupPackage
Reduce ( key:Sub, val: List of tuples T ) ;
1 foreach tup ( s , p , o ) T do
2 set p in locBitstet ;
3 add ( p , o ) to tempMap ;
4 matchedList = match ( locBitSet , ECList ) ;
5 f ( matchedList >1) then
//Ambiguous TripleGroup
6
foreach EC
matchedList do
7
propM ap
cloneM ap ( tempMap , EC.propList );
8
emit
RDFMap( Sub , EC , propM ap ) ;
else
//Perfect TripleGroup
9
emit
RDFMap ( Sub , matchedList [0], tempMap );
6.8.2.1 Setup and Testbed
The evaluation was conducted on a 10-node Hadoop cluster with BSBM-250k
data set (approximately 86M triples with 250k Products {22 GB}). Four queries
( dq 0 to dq 4) containing two star patterns are considered, with varying numbers
of repeated properties (from 0 to 4, respectively) in the second star subpattern.
Figure 6.14 shows the graph representation of queries dq 0 and dq 4 (black and gray
edges denote an arbitrary unique property and a repeated property, respectively).
The queries include the following DupPs: dq 0 (none), dq 1 ( publisher ), dq 2 ( pub-
lisher , type ), dq 3 ( publisher , type , label ), and dq 4 ( publisher , type , label , date ). To
evaluate scalability with increasing size of data, four BSBM data sets were used—
BSBM-{250k, 500k, 750k, 1000k}, with data size ranging from BSBM-250k to
BSBM-1000k (22 to 86 GB).
6.8.2.2 Varying Number of Repeated Properties across a Query
Figure 6.15a shows the execution time and the number of bytes read from HDFS
using the three approaches. In general, SHARD results in highest execution time and
:type
:type
:publisher
:name
:date
:publisher
:name
:date
:type
:publisher
:name
:date
FIGURE 6.14
Graph representation of the example query dq 0 and dq 4.
 
Search WWH ::




Custom Search