Database Reference
In-Depth Information
Fig. 11. This contour plot shows the effects of varying two kinds of nodes within a
cluster with respect to computation time. In this case, the effect of 72 tasks in a
24 node cluster that assumes 32 sub-tasks for each task. The X-axis shows the percent-
age of the cluster that has been upgraded to Faster nodes, while the Y-axis shows the
percentage of the cluster that has been upgraded to Fastest nodes. Impossible points
have been interpolated. The solid lines indicate the trends in the data.
7 Conclusions
As we discussed in [ 9 ], we are able to accommodate heterogeneity in a cluster by
increasing the number of tasks associated with each worker node. Thus far using
experimentation on variable data sizes, variable degrees of heterogeneity in the
cluster, and various data partitioning rules we are able to provide the following
results:
- As the processing data size grows 4.5 fold, the amount of overhead produced as
a result of an increased number of tasks decreases, resulting in performance
improvement only when the file size is large. In the case of a four-task-per
worker ratio, the overall execution time increases by an average of 7.553 % in
the case of the smallest file, and decreases by an average of 1.661 % in the
case of the largest file. Therefore, frameworks should consider heterogeneity
mitigation using a bag-of-tasks mechanism only when the file size is large.
- An increase in task granularity can provide performance improvements even
in clusters that do not have a high degree of heterogeneity. For example,
increasing task granularity from two tasks per worker to four tasks per worker
generates, on average a 3.13 % improvement in execution time across our runs
executed using the largest input file. In particular, improvements are seen in
as little as a 25 % cluster upgrade in the case of four tasks per node; whereas
improvements are not seen until a 75 % upgrade for the two tasks per node
case.
 
Search WWH ::




Custom Search