Database Reference
In-Depth Information
These two concepts relate to the notions of neighborhoods that we have seen earlier. In
fact, Path ( u, v ) is true if and only if v is in N ( u, ∞), which we define to be i≥ 0 N ( u, i ).
Thus, the reachability problem is to compute the union of all the neighborhoods of any
radius for a given node u . The discussion in Section 10.8.2 reminds us that we can com-
pute the reachable set for u by computing its neighborhoods up to that smallest radius d for
which N ( u, d ) = N ( u, d + 1).
The two problems - transitive closure and reachability - are related, but there are many
examples of graphs where reachability is feasible and transitive closure is not. For instance,
suppose we have a Web graph of a billion nodes. If we want to find the pages (nodes)
reachable from a given page, we can do so, even on a single machine with a large main
memory. However, just to produce the transitive closure of the graph could involve 10 18
pairs of nodes, which is not practical, even using a large cluster of computers. 5
10.8.4
Transitive Closure Via MapReduce
When it comes to parallel implementation, transitive closure is actually more readily par-
allelizable than is reachability. If we want to compute N ( v, ∞), the set of nodes reachable
from node v , without computing the entire transitive closure, we have no option but to com-
pute the sequence of neighborhoods, which is essentially a breadth-first search of the graph
from v . In relational terms, suppose we have a relation Arc ( X, Y ) containing those pairs ( x,
y ) such that there is an arc x y . We want to compute iteratively a relation Reach ( X ) that
is the set of nodes reachable from node v . After i rounds, Reach ( X ) will contain all those
nodes in N ( v, i ).
Initially, Reach ( X ) contains only the node v . Suppose it contains all the nodes in N ( v, i )
after some round of MapReduce. To construct N ( v, i + 1) we need to join Reach with the
Arc relation, then project onto the second component and perform a union of the result with
the old value of Reach . In SQL terms, we perform
SELECT DISTINCT Arc.Y
FROM Reach, Arc
WHERE Arc.X = Reach.X;
This query asks us to compute the natural join of Reach ( X ) and Arc ( X, Y ), which we can
do by MapReduce as explained in Section 2.3.7 . Then we have to group the result by Y and
eliminate duplicates, a process that can be done by another MapReduce job as in Section
2.3.8 .
How many rounds this process requires depends on how far from v is the furthest node
that v can reach. In many social-network graphs, the diameter is small, as discussed in
Search WWH ::




Custom Search