Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX - page 127

Database Reference

In-Depth Information

Fig. 11. This contour plot shows the effects of varying two kinds of nodes within a

cluster with respect to computation time. In this case, the effect of 72 tasks in a

24 node cluster that assumes 32 sub-tasks for each task. The X-axis shows the percent-

age of the cluster that has been upgraded to Faster nodes, while the Y-axis shows the

percentage of the cluster that has been upgraded to Fastest nodes. Impossible points

have been interpolated. The solid lines indicate the trends in the data.

7 Conclusions

As we discussed in [ 9 ], we are able to accommodate heterogeneity in a cluster by

increasing the number of tasks associated with each worker node. Thus far using

experimentation on variable data sizes, variable degrees of heterogeneity in the

cluster, and various data partitioning rules we are able to provide the following

results:

- As the processing data size grows 4.5 fold, the amount of overhead produced as

a result of an increased number of tasks decreases, resulting in performance

improvement only when the file size is large. In the case of a four-task-per

worker ratio, the overall execution time increases by an average of 7.553 % in

the case of the smallest file, and decreases by an average of 1.661 % in the

case of the largest file. Therefore, frameworks should consider heterogeneity

mitigation using a bag-of-tasks mechanism only when the file size is large.

- An increase in task granularity can provide performance improvements even

in clusters that do not have a high degree of heterogeneity. For example,

increasing task granularity from two tasks per worker to four tasks per worker

generates, on average a 3.13 % improvement in execution time across our runs

executed using the largest input file. In particular, improvements are seen in

as little as a 25 % cluster upgrade in the case of four tasks per node; whereas

improvements are not seen until a 75 % upgrade for the two tasks per node

case.

Next Page

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home