Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

techniques we learned in Chapter 7 are generally unsuitable for the problem of clustering

social-network graphs.

10.2.1

Distance Measures for Social-Network Graphs

If we were to apply standard clustering techniques to a social-network graph, our first step

would be to define a distance measure. When the edges of the graph have labels, these la-

bels might be usable as a distance measure, depending on what they represented. But when

the edges are unlabeled, as in a “friends” graph, there is not much we can do to define a

suitable distance.

Our first instinct is to assume that nodes are close if they have an edge between them and

distant if not. Thus, we could say that the distance d ( x, y ) is 0 if there is an edge ( x, y ) and

1 if there is no such edge. We could use any other two values, such as 1 and ∞, as long as

the distance is closer when there is an edge.

Neither of these two-valued “distance measures” - 0 and 1 or 1 and ∞ - is a true distance

measure. The reason is that they violate the triangle inequality when there are three nodes,

with two edges between them. That is, if there are edges ( A, B ) and ( B, C ), but no edge ( A,

C ), then the distance from A to C exceeds the sum of the distances from A to B to C . We

could fix this problem by using, say, distance 1 for an edge and distance 1.5 for a missing

edge. But the problem with two-valued distance functions is not limited to the triangle in-

equality, as we shall see in the next section.

10.2.2

Applying Standard Clustering Methods

Recall from Section 7.1.2 that there are two general approaches to clustering: hierarchical

(agglomerative) and point-assignment. Let us consider how each of these would work on

a social-network graph. First, consider the hierarchical methods covered in Section 7.2 . In

particular, suppose we use as the intercluster distance the minimum distance between nodes

of the two clusters.

Hierarchical clustering of a social-network graph starts by combining some two nodes

that are connected by an edge. Successively, edges that are not between two nodes of the

same cluster would be chosen randomly to combine the clusters to which their two nodes

belong. The choices would be random, because all distances represented by an edge are the

same.

EXAMPLE 10.3 Consider again the graph of Fig. 10.1 , repeated here as Fig. 10.3 . First, let

us agree on what the communities are. At the highest level, it appears that there are two

Search WWH ::

Custom Search

Home