Java Reference
In-Depth Information
Input Runs
...
5
10
15
Output Buffer
...
...
6
7
23
5
6
7
10
12
...
12
18
20
Figure8.10 Illustration of multiway merge. The first value in each input run
is examined and the smallest sent to the output. This value is removed from the
input and the process repeated. In this example, values 5, 6, and 12 are compared
first. Value 5 is removed from the first run and sent to the output. Values 10, 6,
and 12 will be compared next. After the first five values have been output, the
“current” value of each block is the one underlined.
the processing would not be sequential anyway (such as when all processing is on
a single disk drive), no time is lost by doing so.
Multiway merging can greatly reduce the number of passes required. If there
is room in memory to store one block for each run, then all runs can be merged
in a single pass. Thus, replacement selection can build initial runs in one pass,
and multiway merging can merge all runs in one pass, yielding a total cost of two
passes. However, for truly large files, there might be too many runs for each to get
a block in memory. If there is room to allocate B blocks for a B-way merge, and
the number of runs R is greater than B, then it will be necessary to do multiple
merge passes. In other words, the first B runs are merged, then the next B, and
so on. These super-runs are then merged by subsequent passes, B super-runs at a
time.
How big a file can be merged in one pass? Assuming B blocks were allocated to
the heap for replacement selection (resulting in runs of average length 2B blocks),
followed by a B-way merge, we can process on average a file of size 2B 2 blocks
in a single multiway merge. 2B k+1 blocks on average can be processed in k B-
way merges. To gain some appreciation for how quickly this grows, assume that
we have available 0.5MB of working memory, and that a block is 4KB, yielding
128 blocks in working memory. The average run size is 1MB (twice the working
memory size). In one pass, 128 runs can be merged. Thus, a file of size 128MB
can, on average, be processed in two passes (one to build the runs, one to do the
merge) with only 0.5MB of working memory. As another example, assume blocks
are 1KB long and working memory is 1MB = 1024 blocks. Then 1024 runs of
average length 2MB (which is about 2GB) can be combined in a single merge
pass. A larger block size would reduce the size of the file that can be processed
Search WWH ::




Custom Search