Database Reference
In-Depth Information
take the union of the result of this query with the
Arc
relation itself, then we get all paths
of length between 1 and 2
i
+1
and can use the union as the
Path
relation in the next round
of recursive doubling. The query itself can be implemented by two MapReduce jobs, one
to do the join and the other to do the union and eliminate duplicates. As we observed for
the parallel reachability computation, the methods of
Sections 2.3.7
and
2.3.8
suffice. The
union, discussed in
Section 2.3.6
doesn't really require a MapReduce job of its own; it can
be combined with the duplicate-elimination.
If a graph has diameter
d
, then after log
2
d
rounds of the above algorithm
Path
contains
all pairs (
x, y
) connected by a path of length at most
d
; that is, it contains all pairs in the
transitive closure. Unless we already know
d
, one more round will be necessary to verify
that no more pairs can be found, but for large
d
, this process takes many fewer rounds than
the breadth-first search that we used for reachability.
However, the above recursive-doubling method does a lot of redundant work. An ex-
ample should make the point clear.
EXAMPLE
10.28 Suppose the shortest path from
x
0
to
x
17
is of length 17; in particular, let
there be a path
x
0
→
x
1
→ · · · →
x
17
. We shall discover the fact
Path
(
x
0
, x
17
) on the fifth
round, when
Path
contains all pairs connected by paths of length up to 16. The same path
from
x
0
to
x
17
will be discovered 16 times when we join
Path
with itself. That is, we can
take the fact
Path
(
x
0
, x
16
) and combine it with
Path
(
x
16
, x
17
) to obtain
Path
(
x
0
, x
17
). Or we
can combine
Path
(
x
0
, x
15
) with
Path
(
x
15
, x
17
) to discover the same fact, and so on.
□
10.8.5
Smart Transitive Closure
A variant of recursive doubling that avoids discovering the same path more than once is
called
smart
transitive closure. Every path of length greater than 1 can be broken into a
head
whose length is a power of 2, followed by a
tail
whose length is no greater than the
length of the head.
EXAMPLE
10.29 A path of length 13 has a head consisting of the first 8 arcs, followed by a
tail consisting of the last 5 arcs. A path of length 2 is a head of length 1 followed by a tail
of length 1. Note that 1 is a power of 2 (the 0th power), and the tail will be as long as the
head whenever the path itself has a length that is a power of 2.
□
To implement smart transitive closure in SQL, we introduce a relation
Q
(
X, Y
) whose
function after the
i
th round is to hold all pairs of nodes (
x, y
) such that the shortest path
from
x
to
y
is of length exactly 2
i
. Also, after the
i
th round,
Path
(
x, y
) will be true if the
shortest path from
x
to
y
is of length at most 2
i
+1
−1. Note that this interpretation of
Path
is