Database Reference
In-Depth Information
The Girvan-Newman Algorithm : The Girvan-Newman Algorithm is an efficient technique for computing the
betweenness of edges. A breadth-first search from each node is performed, and a sequence of labeling steps com-
putes the share of paths from the root to each other node that go through each of the edges. The shares for an edge
that are computed for each root are summed to get the betweenness.
Communities and Complete Bipartite Graphs : A complete bipartite graph has two groups of nodes, all possible
edges between pairs of nodes chosen one from each group, and no edges between nodes of the same group. Any suf-
ficiently dense community (a set of nodes with many edges among them) will have a large complete bipartite graph.
Finding Complete Bipartite Graphs : We can find complete bipartite graphs by the same techniques we used for find-
ing frequent itemsets. Nodes of the graph can be thought of both as the items and as the baskets. The basket corres-
ponding to a node is the set of adjacent nodes, thought of as items. A complete bipartite graph with node groups of
size t and s can be thought of as finding frequent itemsets of size t with support s .
Graph Partitioning : One way to find communities is to partition a graph repeatedly into pieces of roughly similar
sizes. A cut is a partition of the nodes of the graph into two sets, and its size is the number of edges that have one
end in each set. The volume of a set of nodes is the number of edges with at least one end in that set.
Normalized Cuts : We can normalize the size of a cut by taking the ratio of the size of the cut and the volume of each
of the two sets formed by the cut. Then add these two ratios to get the normalized cut value. Normalized cuts with a
low sum are good, in the sense that they tend to divide the nodes into two roughly equal parts, and have a relatively
small size.
Adjacency Matrices : These matrices describe a graph. The entry in row i and column j is 1 if there is an edge between
nodes i and j , and 0 otherwise.
Degree Matrices : The degree matrix for a graph has d in the i th diagonal entry if d is the degree of the i th node. Off
the diagonal, all entries are 0.
Laplacian Matrices : The Laplacian matrix for a graph is its degree matrix minus its adjacency matrix. That is, the
entry in row i and column i of the Laplacian matrix is the degree of the i th node of the graph, and the entry in row i
and column j , for i j , is −1 if there is an edge between nodes i and j , and 0 otherwise.
Spectral Method for Partitioning Graphs : The lowest eigenvalue for any Laplacian matrix is 0, and its corresponding
eigenvector consists of all 1's. The eigenvectors corresponding to small eigenvalues can be used to guide a partition
of the graph into two parts of similar size with a small cut value. For one example, putting the nodes with a positive
component in the eigenvector with the second-smallest eigenvalue into one set and those with a negative component
into the other is usually good.
Overlapping Communities : Typically, individuals are members of several communities. In graphs describing social
networks, it is normal for the probability that two individuals are friends to rise as the number of communities of
which both are members grows.
The Affiliation-Graph Model : An appropriate model for membership in communities is to assume that for each com-
munity there is a probability that because of this community two members become friends (have an edge in the
social network graph). Thus, the probability that two nodes have an edge is 1 minus the product of the probabilities
that none of the communities of which both are members cause there to be an edge between them. We then find
the assignment of nodes to communities and the values of those probabilities that best describes the observed social
graph.
Maximum-Likelihood Estimation : An important modeling technique, useful for modeling communities as well as
many other things, is to compute, as a function of all choices of parameter values that the model allows, the prob-
ability that the observed data would be generated. The values that yield the highest probability are assumed to be
correct, and called the maximum-likelihood estimate (MLE).
Use of Gradient Descent : If we know membership in communities, we can find the MLE by gradient descent or
other methods. However, we cannot find the best membership in communities by gradient descent, because mem-
bership is discrete, not continuous.
Improved Community Modeling by Strength of Membership : We can formulate the problem of finding the MLE of
communities in a social graph by assuming individuals have a strength of membership in each community, possibly
0 if they are not a member. If we define the probability of an edge between two nodes to be a function of their
Search WWH ::




Custom Search