Database Reference
In-Depth Information
10.7.3
Optimality of the Triangle-Finding Algorithm
It turns out that the algorithm described in Section 10.7.2 is, to within an order of mag-
nitude, the best possible. To see why, consider a complete graph on n nodes. This graph has
edges and the number of triangles is Since we cannot enumerate triangles in
less time than the number of those triangles, we know any algorithm will take Ω( n 3 ) time
on this graph. However, m = O ( n 2 ), so any algorithm takes Ω( m 3/2 ) time on this graph.
One might wonder if there were a better algorithm that worked on sparser graphs than
the complete graph. However, we can add to the complete graph a chain of nodes with any
length up to n 2 . This chain adds no more triangles. It no more than doubles the number of
edges, but makes the number of nodes as large as we like, in effect lowering the ratio of
edges to nodes to be as close to 1 as we like. Since there are still Ω( m 3/2 ) triangles, we see
that this lower bound holds for the full range of possible ratios of m/n .
10.7.4
Finding Triangles Using MapReduce
For a very large graph, we want to use parallelism to speed the computation. We can ex-
press triangle-finding as a multiway join and use the technique of Section 2.5.3 to optimize
the use of a single MapReduce job to count triangles. It turns out that this use is one where
the multiway-join technique of that section is generally much more efficient than taking
two two-way joins. Moreover, the total execution time of the parallel algorithm is essen-
tially the same as the execution time on a single processor using the algorithm of Section
10.7.2 .
To begin, assume that the nodes of a graph are numbered 1 , 2 , . . . , n . We use a relation
E to represent edges. To avoid representing each edge twice, we assume that if E ( A, B ) is
a tuple of this relation, then not only is there an edge between nodes A and B , but also, as
integers, we have A < B . 3 This requirement also eliminates loops (edges from a node to it-
self), which we generally assume do not exist in social-network graphs anyway, but which
could lead to “triangles” that really do not involve three different nodes.
Using this relation, we can express the set of triangles of the graph whose edges are E by
the natural join
E ( X, Y ) E ( X, Z ) E ( Y, Z )
(10.2)
To understand this join, we have to recognize that the attributes of the relation E are given
different names in each of the three uses of E . That is, we imagine there are three copies
Search WWH ::




Custom Search