Java Reference
In-Depth Information
21.6.6 replacement selection
The last topic we consider in this chapter is construction of the runs. The strat-
egy used so far is the simplest: We read as many elements as possible and sort
them, writing the result to a tape. This seems like the best approach possible,
until we realize that as soon as the first element is written to the output tape,
the memory it used becomes available for another element. If the next element
on the input tape is larger than the element just output, it can be included in
the run.
Using this observation, we can write an algorithm for producing runs,
commonly called replacement selection. Initially, M elements are read into
memory and placed in a priority queue efficiently with a single buildHeap . We
perform a deleteMin , writing the smallest element to the output tape. We read
the next element from the input tape. If it is larger than the element just writ-
ten, we can add it to the priority queue; otherwise, it cannot go into the current
run. Because the priority queue is smaller by one element, this element is
stored in the dead space of the priority queue until the run has been completed
and is then used for the next run. Storing an element in the dead space is
exactly what is done in heapsort. We continue doing this process until the size
of the priority queue is 0, at which point the run is over. We start a new run by
rebuilding a new priority queue with a buildHeap operation, in the process
using all of the elements in the dead space.
Figure 21.38 shows the run construction for the small example we have
been using, with M = 3. Elements that are reserved for the next run are
shaded. Elements 11, 94, and 81 are placed with buildHeap . Element 11 is out-
put, and then 96 is placed in the heap by an insertion because it is larger than
11. Element 81 is output next, and then 12 is read. As 12 is smaller than the 81
just output, it cannot be included in the current run. Thus it is placed in the
heap dead space. The heap now logically contains only 94 and 96. After they
are output, we have only dead space elements, so we construct a heap and
begin run 2.
In this example, replacement selection produces only 3 runs, compared to
the 5 runs obtained by sorting. As a result, a three-way merge finishes in one
pass instead of two. If the input is randomly distributed, replacement selection
produces runs of average length 2 M . For our large example, we would expect
160 runs instead of 320 runs, so a five-way merge would still require four
passes. In this case, we have not saved a pass, although we might if we get
lucky and have 125 runs or fewer. Because external sorts take so long, every
pass saved can make a significant difference in the running time.
As we have shown, replacement selection may do no better than the stan-
dard algorithm. However, the input is frequently nearly sorted to start with, in
If we are clever, we
can make the
length of the runs
that we initially con-
struct larger than
the amount of avail-
able main memory.
This technique is
called replacement
selection .
 
Search WWH ::




Custom Search