The disjoint set class - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

In the remainder of this section, we prove a slightly weaker result. We

show that any sequence of M = Ω ( N ) union and find operations takes a total of

O ( M log* N ) time. The same bound holds if we replace union-by-rank with

union-by-size. This analysis is probably the most complex in this text and is

one of the first truly complex analyses ever performed for an algorithm that is

essentially trivial to implement. By extending this technique, we can show the

stronger bound claimed previously.

24.6.1 analysis of the union/find algorithm

In this section, we establish a fairly tight bound on the running time of a

sequence of M = Ω ( N ) union and find operations. The union and find opera-

tions may occur in any order, but union is done by rank and find is done with

path compression.

We begin with some theorems concerning the number of nodes of rank r .

Intuitively, because of the union-by-rank rule, there are many more nodes of small

rank than of large rank. In particular, there can be at most one node of rank log N .

What we want to do is to produce as precise a bound as possible on the number of

nodes of any particular rank r . Because ranks change only when union operations

are performed (and then only when the two trees have the same rank), we can

prove this bound by ignoring path compression. We do so in Theorem 24.1.

Theorem 24.1

In the absence of path compression, when a sequence of union instructions is being

executed, a node of rank r must have 2 r descendants (including itself).

Proof

The proof is by induction. The basis r = 0 is clearly true. Let T be the tree of rank r

with the fewest number of descendants and x be T 's root. Suppose that the last

union with which x was involved was between T 1 and T 2 . Suppose that T 1 's root was

x . If T 1 had rank r , then T 1 would be a tree of rank r with fewer descendants than T .

This condition contradicts the assumption that T is the tree with the smallest number

of descendants. Hence the rank of T 1 is at most r - 1 . The rank of T 2 is at most the

rank of T 1 because of union-by-rank. As T has rank r and the rank could only

increase because of T 2 , it follows that the rank of T 2 is r - 1 . Then the rank of T 1 is

also r - 1 . By the induction hypothesis, each tree has at least 2 r - 1 descendants,

giving a total of 2 r and establishing the theorem.

Theorem 24.1 says that if no path compression is performed, any node of

rank r must have at least 2 r descendants. Path compression can change this

condition, of course, because it can remove descendants from a node. However,

Search WWH ::

Custom Search

Home