Database Reference
In-Depth Information
slightly different from the interpretation of Path in the simple recursive-doubling method
given in Section 10.8.4 .
Initially, set both Q and Path to be copies of the relation Arc . After the i th round, assume
that Q and Path have the contents described in the previous paragraph. Note that for the
round i = 1, the initial values of Q and Path initially satisfy the conditions as described for
i = 0. On the ( i + 1)th round, we do the following:
(1) Compute a new value for Q by joining it with itself, using the SQL query
SELECT DISTINCT q1.X, q2.Y
FROM Q q1, Q q2
WHERE q1.Y = q2.X;
(2) Subtract Path from the relation Q computed in step (1). Note that step (1) discovers
all paths of length 2 i +1 . But some pairs connected by these paths may also have shorter
paths. The result of step (2) is to leave in Q all and only those pairs ( u, v ) such that the
shortest path from u to v has length exactly 2 i +1 .
(3) Join Path with the new value of Q computed in (2), using the SQL query
SELECT DISTINCT Q.X, Path.Y
FROM Q, Path
WHERE Q.Y = Path.X
At the beginning of the round Path contains all ( y, z ) such that the shortest path from y
to z has a length up to 2 i +1 − 1 from y to z , and the new value of Q contains all pairs ( x,
y ) for which the shortest path from x to y is of length 2 i +1 . Thus, the result of this query
is the set of pairs ( x, y ) such that the shortest path from x to y has a length between 2 i +1
+ 1 and 2 i +2 − 1, inclusive.
(4) Set the new value of Path to be the union of the relation computed in step (3), the new
value of Q computed in step (1), and the old value of Path . These three terms give us
all pairs ( x, y ) whose shortest path is of length 2 i +1 + 1 through 2 i +2 − 1, exactly 2 i +1 ,
and 1 through 2 i +1 − 1, respectively. Thus, the union gives us all shortest paths up to
length 2 i +2 − 1, as required by the inductive hypothesis about what is true after each
round.
Each round of the smart transitive-closure algorithm uses steps that are joins, aggregations
(duplicate eliminations), or unions. A round can thus be implemented as a short sequence
of MapReduce jobs. Further, a good deal of work can be saved if these operations can be
Search WWH ::




Custom Search