Database Reference
In-Depth Information
assign a node to the cluster with the shortest average distance to all the nodes of the cluster,
then D should be assigned to the cluster of F , as long as we do not try to place D before
any other nodes are assigned. However, in large graphs, we shall surely make mistakes on
some of the first nodes we place.
10.2.3
Betweenness
Since there are problems with standard clustering methods, several specialized clustering
techniques have been developed to find communities in social networks. In this section we
shall consider one of the simplest, based on finding the edges that are least likely to be in-
side a community.
Define the betweenness of an edge ( a, b ) to be the number of pairs of nodes x and y such
that the edge ( a, b ) lies on the shortest path between x and y . To be more precise, since there
can be several shortest paths between x and y , edge ( a, b ) is credited with the fraction of
those shortest paths that include the edge ( a, b ). As in golf, a high score is bad. It suggests
that the edge ( a, b ) runs between two different communities; that is, a and b do not belong
to the same community.
EXAMPLE 10.5 In Fig. 10.3 the edge ( B, D ) has the highest betweenness, as should surprise
no one. In fact, this edge is on every shortest path between any of A , B , and C to any of D ,
E , F , and G . Its betweenness is therefore 3×4 = 12. In contrast, the edge ( D, F ) is on only
four shortest paths: those from A , B , C , and D to F .
10.2.4
The Girvan-Newman Algorithm
In order to exploit the betweenness of edges, we need to calculate the number of shortest
paths going through each edge. We shall describe a method called the Girvan-Newman
(GN) Algorithm, which visits each node X once and computes the number of shortest paths
from X to each of the other nodes that go through each of the edges. The algorithm begins
by performing a breadth-first search (BFS) of the graph, starting at the node X . Note that
the level of each node in the BFS presentation is the length of the shortest path from X to
that node. Thus, the edges that go between nodes at the same level can never be part of a
shortest path from X .
Edges between levels are called DAG edges (“DAG” stands for directed, acyclic graph).
Each DAG edge will be part of at least one shortest path from root X . If there is a DAG
edge ( Y, Z ), where Y is at the level above Z (i.e., closer to the root), then we shall call Y
Search WWH ::




Custom Search