Database Reference
In-Depth Information
Rather than have a final round that merges these five files into a single sorted file, the
merge saves a trip to disk by directly feeding the reduce function in what is the last phase:
the reduce phase . This final merge can come from a mixture of in-memory and on-disk
segments.
NOTE
The number of files merged in each round is actually more subtle than this example suggests. The goal is
to merge the minimum number of files to get to the merge factor for the final round. So if there were 40
files, the merge would not merge 10 files in each of the four rounds to get 4 files. Instead, the first round
would merge only 4 files, and the subsequent three rounds would merge the full 10 files. The 4 merged
files and the 6 (as yet unmerged) files make a total of 10 files for the final round. The process is illus-
trated in Figure 7-5 .
Note that this does not change the number of rounds; it's just an optimization to minimize the amount of
data that is written to disk, since the final round always merges directly into the reduce.
Search WWH ::




Custom Search