Java Reference
In-Depth Information
appropriate input file. When an output buffer is full, write it to the appropriate
output file.
6. Repeat steps 2 through 5, using the original output files as input files. On the
second pass, the first two records of each input run file are already in sorted
order. Thus, these two runs may be merged and output as a single run of four
elements.
7. Each pass through the run files provides larger and larger runs until only one
run remains.
Example8.6 Using the input of Figure 8.6, we first create runs of length
one split between two input files. We then process these two input files
sequentially, making runs of length two. The first run has the values 20 and
36, which are output to the first output file. The next run has 13 and 17,
which is output to the second file. The run 14, 28 is sent to the first file,
then run 15, 23 is sent to the second file, and so on. Once this pass has
completed, the roles of the input files and output files are reversed. The
next pass will merge runs of length two into runs of length four. Runs 20,
36 and 13, 17 are merged to send 13, 17, 20, 36 to the first output file. Then
runs 14, 28 and 15, 23 are merged to send run 14, 15, 23, 28 to the second
output file. In the final pass, these runs are merged to form the final run 13,
14, 15, 17, 20, 23, 28, 36.
This algorithm can easily take advantage of the double buffering techniques
described in Section 8.3. Note that the various passes read the input run files se-
quentially and write the output run files sequentially. For sequential processing and
double buffering to be effective, however, it is necessary that there be a separate
I/O head available for each file. This typically means that each of the input and
output files must be on separate disk drives, requiring a total of four disk drives for
maximum efficiency.
The external Mergesort algorithm just described requires that log n passes be
made to sort a file of n records. Thus, each record must be read from disk and
written to disk log n times. The number of passes can be significantly reduced by
observing that it is not necessary to use Mergesort on small runs. A simple modi-
fication is to read in a block of data, sort it in memory (perhaps using Quicksort),
and then output it as a single sorted run.
Example8.7 Assume that we have blocks of size 4KB, and records are
eight bytes with four bytes of data and a 4-byte key. Thus, each block con-
tains 512 records. Standard Mergesort would require nine passes to gener-
ate runs of 512 records, whereas processing each block as a unit can be done
 
Search WWH ::




Custom Search