Mining Social-Network Graphs - Mining of Massive Datasets

Database Reference

In-Depth Information

likely it is that this community will cause them to have an edge between them. In this mod-

el, we can adjust the strength of membership for an individual in a community continu-

ously, just as we can adjust the probability associated with a community in the affiliation-

graph model. That improvement allows us to use standard methods, such as gradient des-

cent, to maximize the expression for likelihood. In the improved model, we have

(1) Fixed sets of communities and individuals, as before.

(2) For each community C and individual x , there is a strength of membership parameter

F x C . These parameters can take any nonnegative value, and a value of 0 means the in-

dividual is definitely not in the community.

(3) The probability that community C causes there to be an edge between nodes u and v is

p C ( u, v ) = 1 − e − F u C F v C

As before, the probability of there being an edge between u and v is 1 minus the probab-

ility that none of the communities causes there to be an edge between them. That is, each

community independently causes edges, and an edge exists between two nodes if any com-

munity causes it to exist. More formally, p uv , the probability of an edge between nodes u

and v can be calculated as

Log Likelihood

Usually, we compute the logarithm of the likelihood function (the log likelihood ), rather than the function itself. Doing

so offers several advantages. Products become sums, which often simplifies the expression. Also, summing many

numbers is less prone to numerical rounding errors than is taking the product of many tiny numbers.

If we substitute the formula for p C ( u, v ) that is assumed in the model, we get

p uv = 1 - e -∑ C F u C F v C

Finally, let E be the set of edges in the observed graph. As before, we can write the for-

mula for the likelihood of the observed graph as the product of the expression for p uv for

each edge ( u, v ) that is in E , times the product of 1 − p uv for each edge ( u, v ) that is not in

E . Thus, in the new model, the formula for the likelihood of the graph with edges E is

We can simplify this expression somewhat by taking its logarithm. Remember that max-

imizing a function also maximizes the logarithm of that function, and vice versa. So we can

Search WWH ::

Custom Search

Home