Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

10.7.3

Optimality of the Triangle-Finding Algorithm

It turns out that the algorithm described in Section 10.7.2 is, to within an order of mag-

nitude, the best possible. To see why, consider a complete graph on n nodes. This graph has

edges and the number of triangles is Since we cannot enumerate triangles in

less time than the number of those triangles, we know any algorithm will take Ω( n 3 ) time

on this graph. However, m = O ( n 2 ), so any algorithm takes Ω( m 3/2 ) time on this graph.

One might wonder if there were a better algorithm that worked on sparser graphs than

the complete graph. However, we can add to the complete graph a chain of nodes with any

length up to n 2 . This chain adds no more triangles. It no more than doubles the number of

edges, but makes the number of nodes as large as we like, in effect lowering the ratio of

edges to nodes to be as close to 1 as we like. Since there are still Ω( m 3/2 ) triangles, we see

that this lower bound holds for the full range of possible ratios of m/n .

10.7.4

Finding Triangles Using MapReduce

For a very large graph, we want to use parallelism to speed the computation. We can ex-

press triangle-finding as a multiway join and use the technique of Section 2.5.3 to optimize

the use of a single MapReduce job to count triangles. It turns out that this use is one where

the multiway-join technique of that section is generally much more efficient than taking

two two-way joins. Moreover, the total execution time of the parallel algorithm is essen-

tially the same as the execution time on a single processor using the algorithm of Section

10.7.2 .

To begin, assume that the nodes of a graph are numbered 1 , 2 , . . . , n . We use a relation

E to represent edges. To avoid representing each edge twice, we assume that if E ( A, B ) is

a tuple of this relation, then not only is there an edge between nodes A and B , but also, as

integers, we have A < B . 3 This requirement also eliminates loops (edges from a node to it-

self), which we generally assume do not exist in social-network graphs anyway, but which

could lead to “triangles” that really do not involve three different nodes.

Using this relation, we can express the set of triangles of the graph whose edges are E by

the natural join

E ( X, Y ) ⋈ E ( X, Z ) ⋈ E ( Y, Z )

(10.2)

To understand this join, we have to recognize that the attributes of the relation E are given

different names in each of the three uses of E . That is, we imagine there are three copies

Search WWH ::

Custom Search

Home