Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

5 Variable Data Size Through Upgrade

This section describes results from tests that vary three different aspects of a

MapReduce matrix multiply application running over MARLA. In particular:

- Increasing the split granularity , the number of tasks per worker node into

which the original data set is split, provides more opportunity for Faster nodes

to receive and complete more work in smaller chunks than slower nodes. In

a 16 node cluster, results describe sets of runs with data split into 16 tasks

(1 per node), 32 tasks (2 per node), 48 tasks (3 per node), and 64 tasks (4 per

node).

- Altering the performance-heterogeneity of the cluster influences the degree to

which the system requires straggler mitigation. Results describe sets of runs on

a homogeneous system of all Baseline nodes (labeled “0 % Faster” in figures),

a system with 25 % of the system upgraded to Faster nodes, systems with 50 %

and 75 % Faster nodes, and a homogeneous system of 100 % Faster nodes.

- Varying the problem size ensures that trends exist as computational require-

ments of the application increase. Experiments set the size of matrices at

33

33 floating point numbers, and set the number of such matrices in

the input data at 500 K, 750 K, 1 M, 1.25 M, 1.5 M, 1.75 M, 2 M, and 2.25 M

matrices.

×

Four split granularities, five performance-heterogeneity levels, and eight input

set sizes translate to 160 different tests. Graphs depict the averages of ten runs

of each test. We plot portions of the data in several different ways to explore

trends and highlight results that provide insight.

5.1 Traditional Coarse-Grained Splits

Figure 1 plots only the data for the most coarse grain split granularity of one task

per worker node. This split mirrors the default behavior in Hadoop and explic-

itly disallows straggler mitigation because all nodes (no matter their capability)

receive exactly one task at the outset of the application. Each group of five bars

corresponds to a different problem size along the x-axis, the y-axis reflects exe-

cution time, and each bar corresponds to a different performance-heterogeneity

(or upgrade level). Larger problem sizes take longer to finish, and clusters with

75 % and 100 % upgraded nodes outperform less capable clusters. However, a

homogeneous cluster with all Baseline nodes, and clusters with 25 % and 50 %

upgraded nodes all perform the same.

To understand this behavior, consider an example. Suppose we have

N

worker

nodes and we assign

+ 1 approximately equal sized tasks to each of them.

In order for this running time to be comparable to the case where we have

N

nodes, we would need a cluster configured in such a way that

the fastest node is nearly twice as fast as the slowest node. In this scenario, the

fastest node takes two tasks of equal size, and the slowest node takes one task

of that same size. This implies that the execution time of the job is not related

tasks for

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home