An Algorithm for Querying Linked Data Using Map-Reduce - Data Management in Cloud, Grid and P2P Systems

Databases Reference

In-Depth Information

E

This mapper produces instances of the branching node tuples in

by replacing

V

the'*'inthe3rdplacewithavaluein

. Between the key-value pairs obtained

and emitted in this way are:

key = (Person4,Article1,Person1), value = (Q1, < *,”Title1” > )

key = (Person2,Article2,Person3), value = (Q1, < *,”Title2” > )

The input of the Mapper with key Q2 is:

E= { ( < *,Article1,Person1 > , < Journal1,* > ),

( < *,Article2,Person3 > , < Journal1,* > ), ... }

V= { (1,Person2), (1,Person4), ... }

Some of the instances that this mapper produces and emitted are:

key = (Person4,Article1,Person1), value = (Q2, < Journal1,* > )

key = (Person2,Article2,Person3), value = (Q2, < Journal1,* > )

The input of the Mapper with key Q3 is:

E= { ( < Person4,*,Person1 > , < *,* > ), ( < Person2,*,Person3 > , < *,* > ) }

V= { (2,Article1), (2,Article2) }

Some key-value pairs produced and emitted (as above) by this mapper are:

key = (Person4,Article1,Person1), value = (Q3, < *,* > )

key = (Person2,Article2,Person3), value = (Q3, < *,* > )

Reducer of Phase 2. In each reducer, the embeddings (one for each subquery

in ( Q 1 ,...,Q n )) are joined 3 to construct the final answers of Q :

reducer2 (key, values)

// key: a tuple of branching node values

// values: pairs of the form ( Q i , partial embedding for non-branching nodes )

begin

- for each join obtained by using one embedding for each subquery do

- Emit the result produced by this join

end.

Example 10. (Continued from Example 9). The Reducer with key (Person4, Ar-

ticle1, Person1) receives the set { (Q1, < *,”Title1” > ), (Q2, < Journal1,* > ), (Q3,

< *,* > ) } , joins these embeddings and returns the answer:

< Person4,Article1,Person1,Journal1,”Title1” >

The reducer with key (Person2,Article2,Person3) receives the set { (Q1, < *,

”Title2” > ), (Q2, < Journal1,* > ), (Q3, < *,* > ) } which joins giving the answer:

< Person2,Article2,Person3,Journal1,”Title2” >

Notice that no other reducer returns solution (as they do not receive embed-

dings for all subqueries).

5.5 Implementation of the Algorithm

An experimental implementation of our algorithm has been developed using

Hadoop 1.0.4. For our experiments we have used a cluster of 14 nodes of the fol-

lowing characteristics: Intel Pentium(R) Dual-Core CPU E5700 3.00GHz with

3 Notice that the joined embeddings are, by construction, compatible.

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home