Database Reference
In-Depth Information
slightly different from the interpretation of
Path
in the simple recursive-doubling method
given in
Section 10.8.4
.
Initially, set both
Q
and
Path
to be copies of the relation
Arc
. After the
i
th round, assume
that
Q
and
Path
have the contents described in the previous paragraph. Note that for the
round
i
= 1, the initial values of
Q
and
Path
initially satisfy the conditions as described for
i
= 0. On the (
i
+ 1)th round, we do the following:
(1) Compute a new value for
Q
by joining it with itself, using the SQL query
SELECT DISTINCT q1.X, q2.Y
FROM Q q1, Q q2
WHERE q1.Y = q2.X;
(2) Subtract
Path
from the relation
Q
computed in step (1). Note that step (1) discovers
all paths of length 2
i
+1
. But some pairs connected by these paths may also have shorter
paths. The result of step (2) is to leave in
Q
all and only those pairs (
u, v
) such that the
shortest path from
u
to
v
has length exactly 2
i
+1
.
(3) Join
Path
with the new value of
Q
computed in (2), using the SQL query
SELECT DISTINCT Q.X, Path.Y
FROM Q, Path
WHERE Q.Y = Path.X
At the beginning of the round
Path
contains all (
y, z
) such that the shortest path from
y
to
z
has a length up to 2
i
+1
− 1 from
y
to
z
, and the new value of
Q
contains all pairs (
x,
y
) for which the shortest path from
x
to
y
is of length 2
i
+1
. Thus, the result of this query
is the set of pairs (
x, y
) such that the shortest path from
x
to
y
has a length between 2
i
+1
+ 1 and 2
i
+2
− 1, inclusive.
(4) Set the new value of
Path
to be the union of the relation computed in step (3), the new
value of
Q
computed in step (1), and the old value of
Path
. These three terms give us
all pairs (
x, y
) whose shortest path is of length 2
i
+1
+ 1 through 2
i
+2
− 1, exactly 2
i
+1
,
and 1 through 2
i
+1
− 1, respectively. Thus, the union gives us all shortest paths up to
length 2
i
+2
− 1, as required by the inductive hypothesis about what is true after each
round.
Each round of the smart transitive-closure algorithm uses steps that are joins, aggregations
(duplicate eliminations), or unions. A round can thus be implemented as a short sequence
of MapReduce jobs. Further, a good deal of work can be saved if these operations can be