Database Reference
In-Depth Information
For the inductive step, suppose we have computed R i ( v, d ) for all i and v . Initialize R i ( v,
d +1) to be R i ( v, d ), for all i and v . Then, consider all arcs x y in the graph, in any order.
For each x y , set R i ( x, d + 1) to the larger of its current value and R i ( y, d ). Observe that
the fact we can consider the arcs in any order may provide a big speedup in the case that we
can store the R i in main memory, while the set of arcs is so large it must be stored on disk.
We can stream all the disk blocks containing arcs one at a time, thus using only one disk
access per iteration per disk block used for arc storage. This advantage is similar to the one
we observed in Section 6.2.1 , where we pointed out how frequent-itemset algorithms like
A-priori could take advantage of reading market-basket data in a stream, where each disk
block was read only once for each round.
To estimate the size of N ( v, d ), combine the values of the R i ( v, d ) for i = 1 , 2 , . . . , k , as
discussed in Section 4.4.3 . That is, group the R 's into small groups, take the average, and
take the median of the averages.
Another improvement to the ANF Algorithm can be had if we are only interested in es-
timating the sizes of the reachable sets, N ( v, ∞). It is then not necessary to save R i ( v, d ) for
different radii d . We can maintain one value R i ( v ) for each hash function h i and each node
v . When, on any round, we consider arc x y , we simply assign
R i ( x ) := max( R i ( x ) , R i ( y ))
We can stop the iteration when at some round no value of any R i ( v ) changes. Or if we know
d is the diameter of the graph, we can just iterate d times.
10.8.8
Exercises for Section 10.8
EXERCISE 10.8.1 For the graph of Fig. 10.9 , which we repeat here as Fig. 10.24 :
(a) If the graph is represented as a directed graph, how many arcs are there?
(b) What are the neighborhood profiles for nodes A and B ?
(c) What is the diameter of the graph?
(d) How many pairs are in the transitive closure? Hint : Do not forget that there are paths
of length greater than zero from a node to itself in this graph.
(e) If we compute the transitive closure by recursive doubling, how many rounds are
needed?
Search WWH ::




Custom Search