Database Reference
In-Depth Information
Combine (step 2C) performed on each mapper. Indeed, we have made different
measurement scenarios in order to study the impact of various parameters on the
execution time on each phase on the one hand. On the other hand, the various
measures allow us to calculate the overhead (data overload) resulting in Split
phase and its impact on the Scatter phase. Finally, the analysis of experiments
guide us to the optimal choice of the parameters of each scenario.
4.1 Evaluation of the Split and Scatter Step (2S)
The first experiment aims to study the impact of input data size on the execution
time of Split Phase. We have turned the split routine on 8 cores node by varying
the size of the input file from 100MB to 1.3GB, and setting the parameters n=25
and m=10. As a result, the overhead rate is 150%. Figure 4 shows the execution
time in function of the input file size . As we can see, the split time increases
with the size of the input data. This is explained by the fact that applying the
split routine on a larger file is to multiply by a matrix of larger size and generate
larger pieces.
We noted that the preparation phase of chunks does not produce a signifi-
cant time overhead compared with MapReduce systems when dividing data into
chunks to send to mappers.
A second experiment aims to study the impact of n, the number of chunks
generated from split, on the execution time of the split and scatter phases. We
applied the split routine on a 1.3 GB file size, by changing n between 25, 82
and 180. Note that the results shown in Figure 5 show the average of values
found by varying the parameter m; for n = 25, m varies from 10 to 24, for n=82,
m=35,40,45,50,55,60 for n=180, m=68,69,70. The results show that for the same
size of input data, the duration of the split phase increases with n. Such is also
the case of the scatter Phase. The larger n is, the more key matrix, involved
in multiplication, increases in size. We deduce that it is better to maximize the
number of pieces to optimize the execution time.
Fig. 4. Duration of Split in fuction of input
data size
Fig. 5. Impact of n on execution time
4.2 Evaluation of the Collect and Combine Step (2C)
In order to start the map task, each mapper needs to restore the package of
meaningful data. To do so, it must contact m-1 friends-mappers, of which it
 
Search WWH ::




Custom Search