Database Reference
In-Depth Information
For the inductive step, suppose we have computed
R
i
(
v, d
) for all
i
and
v
. Initialize
R
i
(
v,
d
+1) to be
R
i
(
v, d
), for all
i
and
v
. Then, consider all arcs
x
→
y
in the graph, in any order.
For each
x
→
y
, set
R
i
(
x, d
+ 1) to the larger of its current value and
R
i
(
y, d
). Observe that
the fact we can consider the arcs in any order may provide a big speedup in the case that we
can store the
R
i
in main memory, while the set of arcs is so large it must be stored on disk.
We can stream all the disk blocks containing arcs one at a time, thus using only one disk
access per iteration per disk block used for arc storage. This advantage is similar to the one
we observed in
Section 6.2.1
,
where we pointed out how frequent-itemset algorithms like
A-priori could take advantage of reading market-basket data in a stream, where each disk
block was read only once for each round.
To estimate the size of
N
(
v, d
), combine the values of the
R
i
(
v, d
) for
i
= 1
,
2
, . . . , k
, as
discussed in
Section 4.4.3
.
That is, group the
R
's into small groups, take the average, and
take the median of the averages.
Another improvement to the ANF Algorithm can be had if we are only interested in es-
timating the sizes of the reachable sets,
N
(
v,
∞). It is then not necessary to save
R
i
(
v, d
) for
different radii
d
. We can maintain one value
R
i
(
v
) for each hash function
h
i
and each node
v
. When, on any round, we consider arc
x
→
y
, we simply assign
R
i
(
x
) := max(
R
i
(
x
)
, R
i
(
y
))
We can stop the iteration when at some round no value of any
R
i
(
v
) changes. Or if we know
d
is the diameter of the graph, we can just iterate
d
times.
10.8.8
Exercises for Section 10.8
EXERCISE
10.8.1 For the graph of
Fig. 10.9
,
which we repeat here as
Fig. 10.24
:
(a) If the graph is represented as a directed graph, how many arcs are there?
(b) What are the neighborhood profiles for nodes
A
and
B
?
(c) What is the diameter of the graph?
(d) How many pairs are in the transitive closure?
Hint
: Do not forget that there are paths
of length greater than zero from a node to itself in this graph.
(e) If we compute the transitive closure by recursive doubling, how many rounds are
needed?