Database Reference
In-Depth Information
the box on “Six Degrees of Separation.” If so, computing reachability in parallel, using
MapReduce or another approach, is feasible. Few rounds of computation will be needed
and the space requirements are not greater than the space it takes to represent the graph.
However, there are some graphs where the number of rounds is a serious impediment.
For instance, in a typical portion of the Web, it has been found that most pages reachable
from a given page are reachable by paths of length 10-15. Yet there are some pairs of pages
such that the first reaches the second, but only through paths whose length is measured in
the hundreds. For instance, blogs are sometimes structured so each response is reachable
only through the comment to which it responds. Running arguments lead to long paths with
no way to “shortcut” around that path. Or a tutorial on the Web, with 50 chapters, may be
structured so you can only get to Chapter i through the page for Chapter i −1.
Interestingly, the transitive closure can be computed much faster in parallel than can
strict reachability. By a recursive-doubling technique, we can double the length of paths
we know about in a single round. Thus, on a graph of diameter d , we need only log 2 d
rounds, rather than d rounds. If d = 6, the difference is not important, but if d = 1000, log 2
d is about 10, so there is a hundredfold reduction in the number of rounds. The problem, as
discussed above, is that while we can compute the transitive closure quickly, we must com-
pute many more facts than are needed for a reachability computation on the same graph,
and therefore the space requirements for transitive closure can greatly exceed the space re-
quirements for reachability. That is, if all we want is the set Reach ( v ), we can compute the
transitive closure of the entire graph, and then throw away all pairs that do not have v as
their first component. But we cannot throw away all those pairs until we are done. During
the computation of the transitive closure, we could wind up computing many facts Path ( x,
y ), where neither x nor y is reachable from v , and even if they are reachable from v , we may
not need to know x can reach y .
Assuming the graph is small enough that we can compute the transitive closure in its
entirety, we still must be careful how we do so using MapReduce or another parallelism
approach. The simplest recursive-doubling approach is to start the the relation Path ( X, Y )
equal to the arc relation Arc ( X, Y ). Suppose that after i rounds, Path ( X, Y ) contains all pairs
( x, y ) such that there is a path from x to y of length at most 2 i . Then if we join Path with
itself at the next round, we shall discover all those pairs ( x, y ) such that there is a path from
x to y of length at most 2 × 2 i = 2 i +1 . The recursive-doubling query in SQL is
SELECT DISTINCT p1.X, p2.Y
FROM Path p1, Path p2
WHERE p1.Y = p2.X;
After computing this query, we get all pairs connected by a path of length between 2 and
2 i +1 , assuming Path contains pairs connected by paths of length between 1 and 2 i . If we
Search WWH ::




Custom Search