Database Reference
In-Depth Information
we can never assign a nonzero likelihood to any set of communities that do not have every
pair of individuals sharing a community. But by taking the probability to be very small, we
bias our computation to favor solutions such that every observed edge is explained by joint
membership in some community.
If we know which nodes are in which communities, then we can compute the likelihood
of the given graph for these edge probabilities using a simple generalization of
Example
lihood of
E
being exactly the set of edges in the observed graph is
EXAMPLE
10.22 Consider the tiny social graph in
Fig. 10.20
. Suppose there are two com-
munities
C
and
D
, with associated probabilities
p
C
and
p
D
. Also, suppose that we have de-
termined (or are using as a temporary hypothesis) that
C
= {
w, x, y
} and
D
= {
w, y, z
}. To
begin, consider the pair of nodes
w
and
x
.
M
wx
= {
C
}; that is, this pair is in community
C
but not in community
D
. Therefore,
p
wx
= 1 − (1 −
p
C
) =
p
C
.
Figure 10.20
A social graph
Similarly,
x
and
y
are only together in
C
,
y
and
z
are only together in
D
, and likewise
w
and
z
are only together in
D
. Thus, we find
p
xy
=
p
C
and
p
yz
=
p
wz
=
p
D
. Now the pair
w
and
y
are together in both communities, so
p
wy
= 1 − (1 −
p
C
)(1 −
p
D
) =
p
C
+
p
D
−
p
C
p
D
. Finally,
x
and
z
are not together in either community, so
p
xz
=
ϵ
.
Now, we can compute the likelihood of the graph in
Fig. 10.20
, given our assumptions
about membership in the two communities. This likelihood is the product of the probabilit-
ies associated with each of the four pairs of nodes whose edges appear in the graph, times
one minus the probability for each of the two pairs whose edges are not there. That is, we
want
p
wx
p
wy
p
xy
p
yz
(1 −
p
wz
)(1 −
p
xz
)
Substituting the expressions we developed above for each of these probabilities, we get
(
p
C
)
2
p
D
(
p
C
+
p
D
−
p
C
p
D
)(1 −
p
D
)(1 −
ϵ
)
Note that
ϵ
is very small, so the last factor is essentially 1 and can be dropped.