Internal Sorting - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

The first inner for loop initializes array cnt . The second loop counts the

number of records to be assigned to each bin. The third loop sets the values in cnt

to their proper indices within array B . Note that the index stored in cnt[j] is the

last index for bin j ; bins are filled from high index to low index. The fourth loop

assigns the records to the bins (within array B ). The final loop simply copies the

records back to array A to be ready for the next pass. Variable rtoi stores r i for

use in bin computation on the i'th iteration. Figure 7.19 shows how this algorithm

processes the input shown in Figure 7.17.

This algorithm requires k passes over the list of n numbers in base r, with

(n + r) work done at each pass. Thus the total work is (nk + rk). What is

this in terms of n? Because r is the size of the base, it might be rather small.

One could use base 2 or 10. Base 26 would be appropriate for sorting character

strings. For now, we will treat r as a constant value and ignore it for the purpose of

determining asymptotic complexity. Variable k is related to the key range: It is the

maximum number of digits that a key may have in base r. In some applications we

can determine k to be of limited size and so might wish to consider it a constant.

In this case, Radix Sort is (n) in the best, average, and worst cases, making it the

sort with best asymptotic complexity that we have studied.

Is it a reasonable assumption to treat k as a constant? Or is there some rela-

tionship between k and n? If the key range is limited and duplicate key values are

common, there might be no relationship between k and n. To make this distinction

clear, use N to denote the number of distinct key values used by the n records.

Thus, N n. Because it takes a minimum of log r N base r digits to represent N

distinct key values, we know that k log r N.

Now, consider the situation in which no keys are duplicated. If there are n

unique keys (n = N), then it requires n distinct code values to represent them.

Thus, k log r n. Because it requires at least (log n) digits (within a constant

factor) to distinguish between the n distinct keys, k is in (log n). This yields

an asymptotic complexity of (n log n) for Radix Sort to process n distinct key

values.

It is possible that the key range is much larger; log r n bits is merely the best

case possible for n distinct values. Thus, the log r n estimate for k could be overly

optimistic. The moral of this analysis is that, for the general case of n distinct key

values, Radix Sort is at best a (n log n) sorting algorithm.

Radix Sort can be much improved by making base r be as large as possible.

Consider the case of an integer key value. Set r = 2 i for some i. In other words,

the value of r is related to the number of bits of the key processed on each pass.

Each time the number of bits is doubled, the number of passes is cut in half. When

processing an integer key value, setting r = 256 allows the key to be processed one

byte at a time. Processing a 32-bit key requires only four passes. It is not unrea-

sonable on most computers to use r = 2 16 = 64K, resulting in only two passes for

Search WWH ::

Custom Search

Home