Database Reference
In-Depth Information
5 Variable Data Size Through Upgrade
This section describes results from tests that vary three different aspects of a
MapReduce matrix multiply application running over MARLA. In particular:
- Increasing the split granularity , the number of tasks per worker node into
which the original data set is split, provides more opportunity for Faster nodes
to receive and complete more work in smaller chunks than slower nodes. In
a 16 node cluster, results describe sets of runs with data split into 16 tasks
(1 per node), 32 tasks (2 per node), 48 tasks (3 per node), and 64 tasks (4 per
node).
- Altering the performance-heterogeneity of the cluster influences the degree to
which the system requires straggler mitigation. Results describe sets of runs on
a homogeneous system of all Baseline nodes (labeled “0 % Faster” in figures),
a system with 25 % of the system upgraded to Faster nodes, systems with 50 %
and 75 % Faster nodes, and a homogeneous system of 100 % Faster nodes.
- Varying the problem size ensures that trends exist as computational require-
ments of the application increase. Experiments set the size of matrices at
33
33 floating point numbers, and set the number of such matrices in
the input data at 500 K, 750 K, 1 M, 1.25 M, 1.5 M, 1.75 M, 2 M, and 2.25 M
matrices.
×
Four split granularities, five performance-heterogeneity levels, and eight input
set sizes translate to 160 different tests. Graphs depict the averages of ten runs
of each test. We plot portions of the data in several different ways to explore
trends and highlight results that provide insight.
5.1 Traditional Coarse-Grained Splits
Figure 1 plots only the data for the most coarse grain split granularity of one task
per worker node. This split mirrors the default behavior in Hadoop and explic-
itly disallows straggler mitigation because all nodes (no matter their capability)
receive exactly one task at the outset of the application. Each group of five bars
corresponds to a different problem size along the x-axis, the y-axis reflects exe-
cution time, and each bar corresponds to a different performance-heterogeneity
(or upgrade level). Larger problem sizes take longer to finish, and clusters with
75 % and 100 % upgraded nodes outperform less capable clusters. However, a
homogeneous cluster with all Baseline nodes, and clusters with 25 % and 50 %
upgraded nodes all perform the same.
To understand this behavior, consider an example. Suppose we have
N
worker
nodes and we assign
+ 1 approximately equal sized tasks to each of them.
In order for this running time to be comparable to the case where we have
N
N
N
nodes, we would need a cluster configured in such a way that
the fastest node is nearly twice as fast as the slowest node. In this scenario, the
fastest node takes two tasks of equal size, and the slowest node takes one task
of that same size. This implies that the execution time of the job is not related
tasks for
Search WWH ::




Custom Search