Database Reference
In-Depth Information
10.5.1
The Nature of Communities
To begin, let us consider what we would expect two overlapping communities to look like.
Our data is a social graph, where nodes are people and there is an edge between two nodes
if the people are “friends.” Let us imagine that this graph represents students at a school,
and there are two clubs in this school: the Chess Club and the Spanish Club. It is reason-
able to suppose that each of these clubs forms a community, as does any other club at the
school. It is also reasonable to suppose that two people in the Chess Club are more likely
to be friends in the graph because they know each other from the club. Likewise, if two
people are in the Spanish Club, then there is a good chance they know each other, and are
likely to be friends.
What if two people are in both clubs? They now have two reasons why they might know
each other, and so we would expect an even greater probability that they will be friends in
the social graph. Our conclusion is that we expect edges to be dense within any commu-
nity, but we expect edges to be even denser in the intersection of two communities, denser
than that in the intersection of three communities, and so on. The idea is suggested by Fig.
10.19 .
Figure 10.19 The overlap of two communities has more friend edges than the nonoverlapping parts of these communit-
ies
10.5.2
Maximum-Likelihood Estimation
Before we see the algorithm for finding communities that have overlap of the kind sugges-
ted in Section 10.5.1 , let us digress and learn a useful modeling tool called maximum-like-
lihood estimation , or MLE. The idea behind MLE is that we make an assumption about the
generative process (the model ) that creates instances of some artifact, for example, “friends
graphs.” The model has parameters that determine the probability of generating any par-
ticular instance of the artifact; this probability is called the likelihood of those parameter
values. We assume that the value of the parameters that gives the largest value of the like-
lihood is the correct model for the observed artifact.
Search WWH ::




Custom Search