Java Reference
In-Depth Information
To demonstrate how slow external accesses really are, we could create a
random file that is large but not too big to fit in main memory. When we read
in the file and sort it by using an efficient algorithm, the time needed to read
the input is likely to be significant compared to the time required to sort the
input, even though sorting is an O ( N log N ) operation (or worse for Shellsort)
and reading the input is only O ( N ).
21.6.2 model for external sorting
The wide variety of mass storage devices makes external sorting much more
device-dependent than internal sorting. The algorithms considered here work
on tapes, which are probably the most restrictive storage medium. Access to
an element on tape is gained by winding the tape to the correct location, so
tapes can be efficiently accessed only in sequential order (in either direction).
Let us assume that we have at least three tape drives for performing the
sort. We need two drives to do an efficient sort; the third drive simplifies mat-
ters. If only one tape drive is present, we are in trouble: Any algorithm will
require Ω ( N 2 ) tape accesses.
We assume that
sorts are performed
on tape. Only
sequential access
of the input is
allowed.
21.6.3 the simple algorithm
The basic external sorting algorithm involves the use of the merge routine
from mergesort. Suppose that we have four tapes A1, A2, B1, and B2, which
are two input and two output tapes. Depending on the point in the algorithm,
the A tapes are used for input and the B tapes for output, or vice versa. Sup-
pose further that the data are initially on A1 and that the internal memory can
hold (and sort) M records at a time. The natural first step is to read M records
at a time from the input tape, sort the records internally, and then write the
sorted records alternately to B1 and B2. Each group of sorted records is called
a run. When done, we rewind all the tapes. If we have the same input as in our
example for Shellsort, the initial configuration is as shown in Figure 21.29. If
The basic external
sort uses repeated
two-way merging.
Each group of
sorted records is a
run . As a result of a
pass, the length of
the runs doubles
and eventually only
a single run
remains.
figure 21.29
Initial tape
configuration
A1
81 4 1 6 2 5 7 9 8
58
41
75
15
A2
B1
B2
 
Search WWH ::




Custom Search