Database Reference
In-Depth Information
A second experiment questions the choice of m, which is the number of friends
to contact during collect phase and the number of input files for the combine
routine. The experiment consists in measuring the execution time of the two
phases in terms of m. The data size to disperse is 1.3GB. The packet size is
16MB. These values generate 82 packets. To proceed in a single iteration, we
set n=82. Therefore, m is from 35 to 60. Time includes the duration of the two
phases: Collect and Combine. Figure 7 illustrates the different results found. We
can observe that more m is small, we get better results.
In consideration, evaluating the step 2S and more specifically the split phase,
we were brought back to deduce that m gives better execution time when getting
closer to n, exceeding n/2. To decide about the choice of m, we re-ran the previous
experiment. Figure 8 includes all the phases; execution times of all phases in
terms of m. We observe that the value of m which gave a minimum time in step
2C, does not lead to a global optimum time. We conclude that the optimal value
of m is n/2.
4.3 Scenarii and Discussion
Based on the results obtained in experiments questioning the impact of the
number of chunks produced and the size of the packets, we are in front of two
alternatives. On the one hand, we discovered that our system takes longer to
generate more chunks (Phase Split). On the other hand, we found that the
optimal packet size leading to a minimum execution time is 16MB. For data of
1,3GB, we obtain 83 packets to be handled. Would it better to generate 83 chunks
and assign each packet to a mapper so that step 2C (Collect and Combine) is
carried out in a single iteration? Or would it be more appropriate and e cient
to generate less chunks to save time during the split phase, and accept that the
step 2C occurs in more than one iteration?
To answer these questions, we performed a comparative study between the
two scenarios. We realized four versions measuring each time the length of four
phases. In each experiment, we varied the number of mappers from 25 to 83.
Figure 9 shows the overall execution time. The first experiment done with 25
mappers takes bit of time to create chunks, while it generates four iterations
of the step 2C. 40 mappers require 40 chunks, which increases the duration of
the split phase compared with the first experiment, but treatallpacketsin3
iterations. 50 mappers require more time to generate their corresponding chunks,
but complete the step 2C in two iterations. As for 83 mappers, the master puts
more than 600 seconds to perform the split routine and prepare the chunks.
And although the two phases Collect and Combine ending in a single iteration
of assignment, the overall time presents a significant additional cost compared
to other senarios. The total overcost is due, primarily and directly, to the split
phase run locally on the master.
 
Search WWH ::




Custom Search