Database Reference
In-Depth Information
techniques we learned in Chapter 7 are generally unsuitable for the problem of clustering
social-network graphs.
10.2.1
Distance Measures for Social-Network Graphs
If we were to apply standard clustering techniques to a social-network graph, our first step
would be to define a distance measure. When the edges of the graph have labels, these la-
bels might be usable as a distance measure, depending on what they represented. But when
the edges are unlabeled, as in a “friends” graph, there is not much we can do to define a
suitable distance.
Our first instinct is to assume that nodes are close if they have an edge between them and
distant if not. Thus, we could say that the distance d ( x, y ) is 0 if there is an edge ( x, y ) and
1 if there is no such edge. We could use any other two values, such as 1 and ∞, as long as
the distance is closer when there is an edge.
Neither of these two-valued “distance measures” - 0 and 1 or 1 and ∞ - is a true distance
measure. The reason is that they violate the triangle inequality when there are three nodes,
with two edges between them. That is, if there are edges ( A, B ) and ( B, C ), but no edge ( A,
C ), then the distance from A to C exceeds the sum of the distances from A to B to C . We
could fix this problem by using, say, distance 1 for an edge and distance 1.5 for a missing
edge. But the problem with two-valued distance functions is not limited to the triangle in-
equality, as we shall see in the next section.
10.2.2
Applying Standard Clustering Methods
Recall from Section 7.1.2 that there are two general approaches to clustering: hierarchical
(agglomerative) and point-assignment. Let us consider how each of these would work on
a social-network graph. First, consider the hierarchical methods covered in Section 7.2 . In
particular, suppose we use as the intercluster distance the minimum distance between nodes
of the two clusters.
Hierarchical clustering of a social-network graph starts by combining some two nodes
that are connected by an edge. Successively, edges that are not between two nodes of the
same cluster would be chosen randomly to combine the clusters to which their two nodes
belong. The choices would be random, because all distances represented by an edge are the
same.
EXAMPLE 10.3 Consider again the graph of Fig. 10.1 , repeated here as Fig. 10.3 . First, let
us agree on what the communities are. At the highest level, it appears that there are two
Search WWH ::




Custom Search