Internal Sorting - Data Structures and Algorithm Analysis

Java Reference

In-Depth Information

a 32-bit key. Of course, this requires a cnt array of size 64K. Performance will

be good only if the number of records is close to 64K or greater. In other words,

the number of records must be large compared to the key size for Radix Sort to be

efficient. In many sorting applications, Radix Sort can be tuned in this way to give

good performance.

Radix Sort depends on the ability to make a fixed number of multiway choices

based on a digit value, as well as random access to the bins. Thus, Radix Sort

might be difficult to implement for certain key types. For example, if the keys

are real numbers or arbitrary length strings, then some care will be necessary in

implementation. In particular, Radix Sort will need to be careful about deciding

when the “last digit” has been found to distinguish among real numbers, or the last

character in variable length strings. Implementing the concept of Radix Sort with

the trie data structure (Section 13.1) is most appropriate for these situations.

At this point, the perceptive reader might begin to question our earlier assump-

tion that key comparison takes constant time. If the keys are “normal integer”

values stored in, say, an integer variable, what is the size of this variable compared

to n? In fact, it is almost certain that 32 (the number of bits in a standard int vari-

able) is greater than log n for any practical computation. In this sense, comparison

of two long integers requires (log n) work.

Computers normally do arithmetic in units of a particular size, such as a 32-bit

word. Regardless of the size of the variables, comparisons use this native word

size and require a constant amount of time since the comparison is implemented in

hardware. In practice, comparisons of two 32-bit values take constant time, even

though 32 is much greater than log n. To some extent the truth of the proposition

that there are constant time operations (such as integer comparison) is in the eye

of the beholder. At the gate level of computer architecture, individual bits are

compared. However, constant time comparison for integers is true in practice on

most computers (they require a fixed number of machine instructions), and we rely

on such assumptions as the basis for our analyses. In contrast, Radix Sort must do

several arithmetic calculations on key values (each requiring constant time), where

the number of such calculations is proportional to the key length. Thus, Radix Sort

truly does (n log n) work to process n distinct key values.

7.8

An Empirical Comparison of Sorting Algorithms

Which sorting algorithm is fastest? Asymptotic complexity analysis lets us distin-

guish between (n 2 ) and (n log n) algorithms, but it does not help distinguish

between algorithms with the same asymptotic complexity. Nor does asymptotic

analysis say anything about which algorithm is best for sorting small lists.

For

answers to these questions, we can turn to empirical testing.

Search WWH ::

Custom Search

Home