Java Reference
In-Depth Information
The first inner for loop initializes array cnt . The second loop counts the
number of records to be assigned to each bin. The third loop sets the values in cnt
to their proper indices within array B . Note that the index stored in cnt[j] is the
last index for bin j ; bins are filled from high index to low index. The fourth loop
assigns the records to the bins (within array B ). The final loop simply copies the
records back to array A to be ready for the next pass. Variable rtoi stores r i for
use in bin computation on the i'th iteration. Figure 7.19 shows how this algorithm
processes the input shown in Figure 7.17.
This algorithm requires k passes over the list of n numbers in base r, with
(n + r) work done at each pass. Thus the total work is (nk + rk). What is
this in terms of n? Because r is the size of the base, it might be rather small.
One could use base 2 or 10. Base 26 would be appropriate for sorting character
strings. For now, we will treat r as a constant value and ignore it for the purpose of
determining asymptotic complexity. Variable k is related to the key range: It is the
maximum number of digits that a key may have in base r. In some applications we
can determine k to be of limited size and so might wish to consider it a constant.
In this case, Radix Sort is (n) in the best, average, and worst cases, making it the
sort with best asymptotic complexity that we have studied.
Is it a reasonable assumption to treat k as a constant? Or is there some rela-
tionship between k and n? If the key range is limited and duplicate key values are
common, there might be no relationship between k and n. To make this distinction
clear, use N to denote the number of distinct key values used by the n records.
Thus, N n. Because it takes a minimum of log r N base r digits to represent N
distinct key values, we know that k log r N.
Now, consider the situation in which no keys are duplicated. If there are n
unique keys (n = N), then it requires n distinct code values to represent them.
Thus, k log r n. Because it requires at least (log n) digits (within a constant
factor) to distinguish between the n distinct keys, k is in (log n). This yields
an asymptotic complexity of (n log n) for Radix Sort to process n distinct key
values.
It is possible that the key range is much larger; log r n bits is merely the best
case possible for n distinct values. Thus, the log r n estimate for k could be overly
optimistic. The moral of this analysis is that, for the general case of n distinct key
values, Radix Sort is at best a (n log n) sorting algorithm.
Radix Sort can be much improved by making base r be as large as possible.
Consider the case of an integer key value. Set r = 2 i for some i. In other words,
the value of r is related to the number of bits of the key processed on each pass.
Each time the number of bits is doubled, the number of passes is cut in half. When
processing an integer key value, setting r = 256 allows the key to be processed one
byte at a time. Processing a 32-bit key requires only four passes. It is not unrea-
sonable on most computers to use r = 2 16 = 64K, resulting in only two passes for
Search WWH ::




Custom Search