Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

combined, say by using the more general patterns of communication permitted by a work-

flow system (see Section 2.4.1 ).

Path Facts Versus Paths

We should be careful to distinguish between a path, which is a sequence of arcs, and a path fact, which is a statement

that there exists a path from some node x to some node y . The path fact has been shown typically as Path ( x, y ). Smart

transitive closure discovers each path only once, but it may discover a path fact more than once. The reason is that

often a graph will have many paths from x to y , and may even have many different paths from x to y that are of the

same length.

Not all paths are discovered independently by smart transitive closure. For instance, if there are arcs w → x → y →

z and also arcs x → u → z , then the path fact Path ( w, z ) will be discovered twice, once by combining Path ( w, y ) with

Path ( y, z ) and again when combining Path ( w, u ) with Path ( u, z ). On the other hand, if the arcs were w → x → y → z

and w → v → y , then Path ( w, z ) would be discovered only once, when combining Path ( w, y ) with Path ( y, z ).

10.8.6

Transitive Closure by Graph Reduction

A typical directed graph such as the Web contains many strongly connected components

(SCCs). We can collapse an SCC to a single node as far as computing the transitive closure

is concerned, since all the nodes in an SCC reach exactly the same nodes. There is an eleg-

ant algorithm for finding the SCCs of a graph in time linear in the size of the graph, due to

J.E. Hopcroft and R.E. Tarjan. However, this algorithm is inherently sequential, based on

depth-first search, and so not well suited to parallel impelementation on large graphs.

We can find most of the SCCs in a graph by some random node selections followed by

two breadth-first searches. Moreover, the larger an SCC is, the more likely it is to be col-

lapsed early, thus reducing the size of the graph quickly. The algorithm for reducing SCCs

to single nodes is as follows. Let G be the graph to be reduced, and let G ′ be G with all the

arcs reversed.

(1) Pick a node v from G at random.

(2) Find N G ( v, ∞), the set of nodes reachable from v in G .

(3) Find N G ′ ( v, ∞), the set of nodes that v reaches in the graph G ′ that has the arcs of G

reversed. Equivalently, this set is those nodes that reach v in G .

(4) Construct the SCC S containing v , which is N G ( v, ∞) ∩ N G ′ ( v, ∞). That is, v and u are

in the same SCC of G if and only if v can reach u and u can reach v .

(5) Replace SCC S by a single node s in G . To do so, delete all nodes in S from G and add

s to the node set of G . Delete from G all arcs of which one or both ends are in S . Then,

add to the arc set of G an arc s → x whenever there was an arc in G from any member

of S to x . Finally, add an arc x → s if there was an arc from x to any member of S .

Search WWH ::

Custom Search

Home