Database Reference
In-Depth Information
combined, say by using the more general patterns of communication permitted by a work-
flow system (see
Section 2.4.1
).
Path Facts Versus Paths
We should be careful to distinguish between a path, which is a sequence of arcs, and a path fact, which is a statement
that there exists a path from some node
x
to some node
y
. The path fact has been shown typically as
Path
(
x, y
). Smart
transitive closure discovers each path only once, but it may discover a path fact more than once. The reason is that
often a graph will have many paths from
x
to
y
, and may even have many different paths from
x
to
y
that are of the
same length.
Not all paths are discovered independently by smart transitive closure. For instance, if there are arcs
w
→
x
→
y
→
z
and also arcs
x
→
u
→
z
, then the path fact
Path
(
w, z
) will be discovered twice, once by combining
Path
(
w, y
) with
Path
(
y, z
) and again when combining
Path
(
w, u
) with
Path
(
u, z
). On the other hand, if the arcs were
w
→
x
→
y
→
z
and
w
→
v
→
y
, then
Path
(
w, z
) would be discovered only once, when combining
Path
(
w, y
) with
Path
(
y, z
).
10.8.6
Transitive Closure by Graph Reduction
A typical directed graph such as the Web contains many strongly connected components
(SCCs). We can collapse an SCC to a single node as far as computing the transitive closure
is concerned, since all the nodes in an SCC reach exactly the same nodes. There is an eleg-
ant algorithm for finding the SCCs of a graph in time linear in the size of the graph, due to
J.E. Hopcroft and R.E. Tarjan. However, this algorithm is inherently sequential, based on
depth-first search, and so not well suited to parallel impelementation on large graphs.
We can find most of the SCCs in a graph by some random node selections followed by
two breadth-first searches. Moreover, the larger an SCC is, the more likely it is to be col-
lapsed early, thus reducing the size of the graph quickly. The algorithm for reducing SCCs
to single nodes is as follows. Let
G
be the graph to be reduced, and let
G
′ be
G
with all the
arcs reversed.
(1) Pick a node
v
from
G
at random.
(2) Find
N
G
(
v,
∞), the set of nodes reachable from
v
in
G
.
(3) Find
N
G
′
(
v,
∞), the set of nodes that
v
reaches in the graph
G
′ that has the arcs of
G
reversed. Equivalently, this set is those nodes that reach
v
in
G
.
(4) Construct the SCC
S
containing
v
, which is
N
G
(
v,
∞) ∩
N
G
′
(
v,
∞). That is,
v
and
u
are
in the same SCC of
G
if and only if
v
can reach
u
and
u
can reach
v
.
(5) Replace SCC
S
by a single node
s
in
G
. To do so, delete all nodes in
S
from
G
and add
s
to the node set of
G
. Delete from
G
all arcs of which one or both ends are in
S
. Then,
add to the arc set of
G
an arc
s
→
x
whenever there was an arc in
G
from any member
of
S
to
x
. Finally, add an arc
x
→
s
if there was an arc from
x
to any member of
S
.