Java Reference
In-Depth Information
Falling Snow
Future snow
Existing snow
Snowplow Movement
Start time T
Figure8.9 The snowplow analogy showing the action during one revolution of
the snowplow. A circular track is laid out straight for purposes of illustration, and
is shown in cross section. At any time T, the most snow is directly in front of
the snowplow. As the plow moves around the track, the same amount of snow is
always in front of the plow. As the plow moves forward, less of this is snow that
was in the track at time T; more is snow that has fallen since.
the file. While R should be much less than the total number of records (because
the initial runs should each contain many records), we would like to reduce still
further the number of passes required to merge the runs together. Note that two-
way merging does not make good use of available memory. Because merging is a
sequential process on the two runs, only one block of records per run need be in
memory at a time. Keeping more than one block of a run in memory at any time
will not reduce the disk I/O required by the merge process (though if several blocks
are read from a file at once time, at least they take advantage of sequential access).
Thus, most of the space just used by the heap for replacement selection (typically
many blocks in length) is not being used by the merge process.
We can make better use of this space and at the same time greatly reduce the
number of passes needed to merge the runs if we merge several runs at a time.
Multiway merging is similar to two-way merging. If we have B runs to merge,
with a block from each run available in memory, then the B-way merge algorithm
simply looks at B values (the front-most value for each input run) and selects the
smallest one to output. This value is removed from its run, and the process is
repeated. When the current block for any run is exhausted, the next block from that
run is read from disk. Figure 8.10 illustrates a multiway merge.
Conceptually, multiway merge assumes that each run is stored in a separate file.
However, this is not necessary in practice. We only need to know the position of
each run within a single file, and use seek to move to the appropriate block when-
ever we need new data from a particular run. Naturally, this approach destroys the
ability to do sequential processing on the input file. However, if all runs were stored
on a single disk drive, then processing would not be truly sequential anyway be-
cause the I/O head would be alternating between the runs. Thus, multiway merging
replaces several (potentially) sequential passes with a single random access pass. If
Search WWH ::




Custom Search