An Algorithm for Querying Linked Data Using Map-Reduce - Data Management in Cloud, Grid and P2P Systems - page 56

Databases Reference

In-Depth Information

(Q)

(Q 1 )

(Q 2 )

(Q 3 )

?P2

?P1

?P1

t2

?P1

n1

n1

n3

hasSupervisor

n1

t1

t3

?P2

hasAuthor

hasAuthor

hasAuthor

n3

t2

t1

n2

hasAuthor

n2

hasSupervisor

t3

?A

?A

n2

?A

t5

t4

n3

t4

t5

hasTitle

publishedIn

hasTitle

?P2

publishedIn

n5

n4

n5

n4

Journal1

?T

?T

Journal1

Fig. 4. Query decomposition

Theorem 1 suggests the following general query evaluation strategy:

Step 1: Decompose the data graph G intoaset

P

=( G 1 ,...,G m )ofsegments

that are stored in different computer nodes.

Step 2: Decompose the query Q into a set of subqueries

=( Q 1 ,...,Q n ).

Step 3: Compute locally (at each node keeping a segment G i ) all possible

useful partial embeddings of each subquery Q j .

Step 4: For each subquery Q j , collect and join the partial embeddings of Q j

obtained in Step 3 to get total embeddings of Q j .

Step 5: To construct the total embeddings (i.e. answers) of Q join the total

embeddings obtained in Step 4 by using one embedding for each subquery.

F

Example 5. (Continued from Example 4) Consider the decomposition of the

query Q appearing in the right part of Fig. 4. The answers appearing in Ex-

ample 4 can be found by applying queries Q 1 , Q 2 and Q 3 on segments G 1 , G 2

and G 3 and joining the embeddings obtained in this way.

5 An Algorithm to Answer Queries Using Map-Reduce

In this section we present an algorithm which implements the query evaluation

strategy, presented in Section 4, on the MapReduce programming framework.

5.1 MapReduce Framework

MapReduce is a programming model for processing large datasets in a dis-

tributed manner. The storage layer for the MapReduce framework is a Dis-

tributed File System (DFS), such as Hadoop Distributed File System (HDFS),

and is characterized by the block size which is typically 16-128MB in most of

DFSs. Creating a MapReduce job is straightforward. The user defines two func-

tions, the Map and the Reduce function, which run in each cluster node, in

Next Page

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home