Internal Sorting - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

is still unlikely to happen. It does not take many good partitionings for Quicksort

to work fairly well.

Quicksort's best case occurs when findpivot always breaks the array into

two equal halves. Quicksort repeatedly splits the array into smaller partitions, as

shown in Figure 7.14. In the best case, the result will be log n levels of partitions,

with the top level having one array of size n, the second level two arrays of size n=2,

the next with four arrays of size n=4, and so on. Thus, at each level, all partition

steps for that level do a total of n work, for an overall cost of n log n work when

Quicksort finds perfect pivots.

Quicksort's average-case behavior falls somewhere between the extremes of

worst and best case. Average-case analysis considers the cost for all possible ar-

rangements of input, summing the costs and dividing by the number of cases. We

make one reasonable simplifying assumption: At each partition step, the pivot is

equally likely to end in any position in the (sorted) array. In other words, the pivot

is equally likely to break an array into partitions of sizes 0 and n1, or 1 and n2,

and so on.

Given this assumption, the average-case cost is computed from the following

equation:

n1

X

1

n

T(n) = cn +

[T(k) + T(n 1 k)]; T(0) = T(1) = c:

k=0

This equation is in the form of a recurrence relation. Recurrence relations are

discussed in Chapters 2 and 14, and this one is solved in Section 14.2.4. This

equation says that there is one chance in n that the pivot breaks the array into

subarrays of size 0 and n 1, one chance in n that the pivot breaks the array into

subarrays of size 1 and n2, and so on. The expression “T(k) + T(n1k)” is

the cost for the two recursive calls to Quicksort on two arrays of size k and n1k.

The initial cn term is the cost of doing the findpivot and partition steps, for

some constant c. The closed-form solution to this recurrence relation is (n log n).

Thus, Quicksort has average-case cost (n log n).

This is an unusual situation that the average case cost and the worst case cost

have asymptotically different growth rates. Consider what “average case” actually

means. We compute an average cost for inputs of size n by summing up for every

possible input of size n the product of the running time cost of that input times the

probability that that input will occur. To simplify things, we assumed that every

permutation is equally likely to occur. Thus, finding the average means summing

up the cost for every permutation and dividing by the number of inputs (n!). We

know that some of these n! inputs cost O(n 2 ). But the sum of all the permutation

costs has to be (n!)(O(n log n)). Given the extremely high cost of the worst inputs,

there must be very few of them. In fact, there cannot be a constant fraction of the

inputs with cost O(n 2 ). Even, say, 1% of the inputs with cost O(n 2 ) would lead to

Data Structures and Algorithm Analysis

Search WWH ::

Custom Search

Home