A priority queue: the binary heap - Data Structures and Problem Solving Using Java

Java Reference

In-Depth Information

21.6.6 replacement selection

The last topic we consider in this chapter is construction of the runs. The strat-

egy used so far is the simplest: We read as many elements as possible and sort

them, writing the result to a tape. This seems like the best approach possible,

until we realize that as soon as the first element is written to the output tape,

the memory it used becomes available for another element. If the next element

on the input tape is larger than the element just output, it can be included in

the run.

Using this observation, we can write an algorithm for producing runs,

commonly called replacement selection. Initially, M elements are read into

memory and placed in a priority queue efficiently with a single buildHeap . We

perform a deleteMin , writing the smallest element to the output tape. We read

the next element from the input tape. If it is larger than the element just writ-

ten, we can add it to the priority queue; otherwise, it cannot go into the current

run. Because the priority queue is smaller by one element, this element is

stored in the dead space of the priority queue until the run has been completed

and is then used for the next run. Storing an element in the dead space is

exactly what is done in heapsort. We continue doing this process until the size

of the priority queue is 0, at which point the run is over. We start a new run by

rebuilding a new priority queue with a buildHeap operation, in the process

using all of the elements in the dead space.

Figure 21.38 shows the run construction for the small example we have

been using, with M = 3. Elements that are reserved for the next run are

shaded. Elements 11, 94, and 81 are placed with buildHeap . Element 11 is out-

put, and then 96 is placed in the heap by an insertion because it is larger than

11. Element 81 is output next, and then 12 is read. As 12 is smaller than the 81

just output, it cannot be included in the current run. Thus it is placed in the

heap dead space. The heap now logically contains only 94 and 96. After they

are output, we have only dead space elements, so we construct a heap and

begin run 2.

In this example, replacement selection produces only 3 runs, compared to

the 5 runs obtained by sorting. As a result, a three-way merge finishes in one

pass instead of two. If the input is randomly distributed, replacement selection

produces runs of average length 2 M . For our large example, we would expect

160 runs instead of 320 runs, so a five-way merge would still require four

passes. In this case, we have not saved a pass, although we might if we get

lucky and have 125 runs or fewer. Because external sorts take so long, every

pass saved can make a significant difference in the running time.

As we have shown, replacement selection may do no better than the stan-

dard algorithm. However, the input is frequently nearly sorted to start with, in

If we are clever, we

can make the

length of the runs

that we initially con-

struct larger than

the amount of avail-

able main memory.

This technique is

called replacement

selection .

Data Structures and Problem Solving Using Java

Search WWH ::

Custom Search

Home