Database Reference
In-Depth Information
requests the information necessary to rebuild the package that has been assigned.
Two parameters are involved, the number of friends: m, and the size of data to
be restored: the packet size.
In order to study the impact of the packet size to generate over step 2C, we
performed a first experiment which calculates the execution time depending on
the packet size, which varies from 5 to 128 MB. During the execution of the
step 2C, the number of chunks generated in the split phase is not important.
What matters in the implementation stage is the number of available mappers. A
maximum number of mappers ensures that the 2C step takes place in a minimum
number of iterations, a single iteration if possible. With 1.3GB as input data size
and 5MB as packet size, we would have to treat 260 packets, against only 11 for
a packet size of 128MB. The execution time is composed of Collect phase and
Combine phase.
Figure 6 presents the results found. As we can see, a packet size equal to 5MB
generates a duration of the Collect phase up to the double for other sizes where
the duration does not change practically. Nevertheless, we can see a minimum
time with a packet size between 10 and 16MB. Indeed, the duration made by
the system with a 5MB packet includes two iterations taken to process the 260
packets affected to 180 mappers. As for the duration of the combine phase, we
reach a minimum , also with packet size between 10 and 16MB. For the rest,
the more the packet size is, the more combine duration rises. This increase is
explained by the fact that applying the rabin-combine routine on larger data,
returns to handle larger matrices.
We can conclude that the duration of a single iteration 2C is minimal with a
smaller packet size. In contrast, a small size generates more packets. Which leads
to more iterations for assigning all the packets. Accordingly the right choice is
to provide an appropriate size, according to the number of available mappers n,
to minimize the total number of iterations. In our example, to be closer to the
optimal value of the systems implementing the MapRdeuce model, we choose 16
MB for the rest of experiments.
Fig. 6. Time of Collect & Combine accord-
ing to the packet size
Fig. 7. Time of Collect & Combine accord-
ing to m
 
Search WWH ::




Custom Search