Databases Reference
In-Depth Information
(Q)
(Q 1 )
(Q 2 )
(Q 3 )
?P2
?P1
?P1
t2
?P1
n1
n1
n3
hasSupervisor
n1
t1
t3
?P2
hasAuthor
hasAuthor
hasAuthor
n3
t2
t1
n2
hasAuthor
n2
hasSupervisor
t3
?A
?A
n2
?A
t5
t4
n3
t4
t5
hasTitle
publishedIn
hasTitle
?P2
publishedIn
n5
n4
n5
n4
Journal1
?T
?T
Journal1
Fig. 4. Query decomposition
Theorem 1 suggests the following general query evaluation strategy:
Step 1: Decompose the data graph G intoaset
P
=( G 1 ,...,G m )ofsegments
that are stored in different computer nodes.
Step 2: Decompose the query Q into a set of subqueries
=( Q 1 ,...,Q n ).
Step 3: Compute locally (at each node keeping a segment G i ) all possible
useful partial embeddings of each subquery Q j .
Step 4: For each subquery Q j , collect and join the partial embeddings of Q j
obtained in Step 3 to get total embeddings of Q j .
Step 5: To construct the total embeddings (i.e. answers) of Q join the total
embeddings obtained in Step 4 by using one embedding for each subquery.
F
Example 5. (Continued from Example 4) Consider the decomposition of the
query Q appearing in the right part of Fig. 4. The answers appearing in Ex-
ample 4 can be found by applying queries Q 1 , Q 2 and Q 3 on segments G 1 , G 2
and G 3 and joining the embeddings obtained in this way.
5 An Algorithm to Answer Queries Using Map-Reduce
In this section we present an algorithm which implements the query evaluation
strategy, presented in Section 4, on the MapReduce programming framework.
5.1 MapReduce Framework
MapReduce is a programming model for processing large datasets in a dis-
tributed manner. The storage layer for the MapReduce framework is a Dis-
tributed File System (DFS), such as Hadoop Distributed File System (HDFS),
and is characterized by the block size which is typically 16-128MB in most of
DFSs. Creating a MapReduce job is straightforward. The user defines two func-
tions, the Map and the Reduce function, which run in each cluster node, in
 
Search WWH ::




Custom Search