Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

assign a node to the cluster with the shortest average distance to all the nodes of the cluster,

then D should be assigned to the cluster of F , as long as we do not try to place D before

any other nodes are assigned. However, in large graphs, we shall surely make mistakes on

some of the first nodes we place.

□

10.2.3

Betweenness

Since there are problems with standard clustering methods, several specialized clustering

techniques have been developed to find communities in social networks. In this section we

shall consider one of the simplest, based on finding the edges that are least likely to be in-

side a community.

Define the betweenness of an edge ( a, b ) to be the number of pairs of nodes x and y such

that the edge ( a, b ) lies on the shortest path between x and y . To be more precise, since there

can be several shortest paths between x and y , edge ( a, b ) is credited with the fraction of

those shortest paths that include the edge ( a, b ). As in golf, a high score is bad. It suggests

that the edge ( a, b ) runs between two different communities; that is, a and b do not belong

to the same community.

EXAMPLE 10.5 In Fig. 10.3 the edge ( B, D ) has the highest betweenness, as should surprise

no one. In fact, this edge is on every shortest path between any of A , B , and C to any of D ,

E , F , and G . Its betweenness is therefore 3×4 = 12. In contrast, the edge ( D, F ) is on only

four shortest paths: those from A , B , C , and D to F .

□

10.2.4

The Girvan-Newman Algorithm

In order to exploit the betweenness of edges, we need to calculate the number of shortest

paths going through each edge. We shall describe a method called the Girvan-Newman

(GN) Algorithm, which visits each node X once and computes the number of shortest paths

from X to each of the other nodes that go through each of the edges. The algorithm begins

by performing a breadth-first search (BFS) of the graph, starting at the node X . Note that

the level of each node in the BFS presentation is the length of the shortest path from X to

that node. Thus, the edges that go between nodes at the same level can never be part of a

shortest path from X .

Edges between levels are called DAG edges (“DAG” stands for directed, acyclic graph).

Each DAG edge will be part of at least one shortest path from root X . If there is a DAG

edge ( Y, Z ), where Y is at the level above Z (i.e., closer to the root), then we shall call Y

Search WWH ::

Custom Search

Home