Database Reference
In-Depth Information
membership strengths in their common communities, we can turn the problem of finding the MLE into a continuous
problem and solve it using gradient descent.
Simrank : One way to measure the similarity of nodes in a graph with several types of nodes is to start a random
walker at one node and allow it to wander, with a fixed probability of restarting at the same node. The distribution
of where the walker can be expected to be is a good measure of the similarity of nodes to the starting node. This
process must be repeated with each node as the starting node if we are to get all-pairs similarity.
Triangles in Social Networks : The number of triangles per node is an important measure of the closeness of a com-
munity and often reflects its maturity. We can enumerate or count the triangles in a graph with m edges in O ( m 3 /2 )
time, but no more efficient algorithm exists in general.
Triangle Finding by MapReduce : We can find triangles in a single round of MapReduce by treating it as a three-way
join. Each edge must be sent to a number of reducers proportional to the cube root of the total number of reducers,
and the total computation time spent at all the reducers is proportional to the time of the serial algorithm for triangle
finding.
Neighborhoods : The neighborhood of radius d for a node v in a directed or undirected graph is the set of nodes reach-
able from v along paths of length at most d . The neighborhood profile of a node is the sequence of neighborhood
sizes for all distances from 1 upwards. The diameter of a connected graph is the smallest d for which the neighbor-
hood of radius d for any starting node includes the entire graph.
Transitive Closure : A node v can reach node u if u is in the neighborhood of v for some radius. The transitive closure
of a graph is the set of pairs of nodes ( v, u ) such that v can reach u .
Computing Transitive Closure : Since the transitive closure can have a number of facts equal to the square of the
number of nodes of a graph, it is infeasible to compute transitive closure directly for large graphs. One approach is
to find strongly connected components of the graph and collapse them each to a single node before computing the
transitive closure.
Transitive Closure and MapReduce : We can view transitive closure computation as the iterative join of a path rela-
tion (pairs of nodes v and u such that u is known to be reachable from v ) and the arc relation of the graph. Such an
approach requires a number of MapReduce rounds equal to the diameter of the graph.
Transitive Closure by Recursive Doubling : An approach that uses fewer MapReduce rounds is to join the path rela-
tion with itself at each round. At each round, we double the length of paths that are able to contribute to the transitive
closure. Thus, the number of needed rounds is only the base-2 logarithm of the diameter of the graph.
Smart Transitive Closure : While recursive doubling can cause the same path to be considered many times, and thus
increases the total computation time (compared with iteratively joining paths with single arcs), a variant called smart
transitive closure avoids discovering the same path more than once. The trick is to require that, when joining two
paths, the first has a length that is a power of 2.
Approximating Neighborhood Sizes : By using the Flajolet-Martin technique for approximating the number of dis-
tinct elements in a stream, we can find the neighborhood sizes at different radii approximately. We maintain a set of
tail lengths for each node. To increase the radius by 1, we examine each edge ( u, v ) and for each tail length for u we
set it equal to the corresponding tail length for v if the latter is larger than the former.
10.10 References for Chapter 10
Simrank comes from [ 8 ]. An alternative approach in [ 11 ] views similarity of two nodes as
the propability that random walks from the two nodes will be at the same node. [ 3 ] com-
bines random walks with node classification to predict links in a social-network graph. [ 16 ]
looks at the efficiency of computing simrank as a personalized PageRank.
Search WWH ::




Custom Search