Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

(1) Compute the degree of each node. This part requires only that we examine each edge

and add 1 to the count of each of its two nodes. The total time required is O ( m ).

(2) Create an index on edges, with the pair of nodes at its ends as the key. That is, the

index allows us to determine, given two nodes, whether the edge between them ex-

ists. A hash table suffices. It can be constructed in O ( m ) time, and the expected time

to answer a query about the existence of an edge is a constant, at least in the expected

sense. 2

(3) Create another index of edges, this one with key equal to a single node. Given a node

v , we can retrieve the nodes adjacent to v in time proportional to the number of those

nodes. Again, a hash table, this time with single nodes as the key, suffices in the ex-

pected sense.

We shall order the nodes as follows. First, order nodes by degree. Then, if v and u have

the same degree, recall that both v and u are integers, so order them numerically. That is,

we say v ≺ u if and only if either

(i) The degree of v is less than the degree of u , or

(ii) The degrees of u and v are the same, and v < u .

Heavy-Hitter Triangles : There are only heavy-hitter nodes, so we can consider

all sets of three of these nodes. There are O ( m 3/2 ) possible heavy-hitter triangles, and using

the index on edges we can check if all three edges exist in O (1) time. Therefore, O ( m 3/2 )

time is needed to find all the heavy-hitter triangles.

Other Triangles : We find the other triangles a different way. Consider each edge ( v 1 ,

v 2 ). If both v 1 and v 2 are heavy hitters, ignore this edge. Suppose, however, that v 1 is not

a heavy hitter and moreover v 1 ≺ v 2 . Let u 1 , u 2 , . . . , u k be the nodes adjacent to v 1 . Note

that We can find these nodes, using the index on nodes, in O ( k ) time, which is

surely time. For each u i we can use the first index to check whether edge ( u i , v 2 )

exists in O (1) time. We can also determine the degree of u i in O (1) time, because we have

counted all the nodes' degrees. We count the triangle { v 1 , v 2 , u i } if and only if the edge ( u i ,

v 2 ) exists, and v 1 ≺ u i . In that way, a triangle is counted only once - when v 1 is the node

of the triangle that precedes both other nodes of the triangle according to the ≺ ordering.

Thus, the time to process all the nodes adjacent to v 1 is

Since there are m edges,

the total time spent counting other triangles is O ( m 3/2 ).

We now see that preprocessing takes O ( m ) time. The time to find heavy-hitter triangles

is O ( m 3/2 ), and so is the time to find the other triangles. Thus, the total time of the algorithm

is O ( m 3/2 ).

Search WWH ::

Custom Search

Home