Database Reference
In-Depth Information
NFS alongside HDFS when processing Big Data [ 17 ]. Since HDFS does not sup-
port late-binding of tasks to workers, and that is the aspect of this framework
we wish to study, we limit our study to an NFS-based environment.
4.1 Clusters with Two Levels of Nodes
The first set of experiments varies the cluster configuration, the split granularity
(that is, the number of tasks-per-node into which the framework splits the prob-
lem), and the input data size. In particular, we run tests for all combinations of
the following:
- Cluster configuration : 16-node clusters with some Baseline nodes and some
Faster nodes, varying the percentages of each in increments of four nodes, or
25 % of the cluster nodes. 2
- Split granularity : We vary the number of tasks per node from one to four. To
utilize the upgraded nodes most effectively, the number of cores parameter of
the MARLA framework is defined as eight. Recall that this parameter defines
how many sub-tasks to attribute to each task.
- Problem size : We use input matrices of size 33
33 randomly generated floating
point values, multiplying 500 K, 750 K, 1 M, 1.25 M, 1.5 M, 1.75 M, 2 M, and
2.25 M matrices during execution of the various MapReduce jobs.
×
Section 5 contains results for this set of experiments.
4.2 Clusters with Three Levels of Nodes
The second set of experiments studies the effect of introducing the third class of
Fastest nodes. We vary a 24-node cluster to contain all Baseline nodes, and then
a variety of upgrade combinations. In particular, we vary the number of Faster
nodes from zero to twenty-four, in increments of two. We simultaneously vary
the number of Fastest nodes from zero to twelve, in increments two. We use tuple
notation
<b,f,t>
to indicate the number of nodes at the
<b
=
Baseline, f
=
Fast,t
=
Fastest>
levels. We run tests for all tuples
<b,f,t>
in the following
set:
.
In this configuration, we also vary the number of cores per worker alongside
the number of tasks. This is done to identify what happens when the number of
cores in the configuration file is not reflective of the actual number of cores on
the most powerful of the nodes. To do this we consider splitting the tasks into
8 sub-tasks as we did for the previous experiments; we also consider splitting the
tasks into 32 sub-tasks in an effort to take full advantage of the Fastest nodes.
As with the previous set of experiments, we also vary the number of tasks. We
vary this parameter in the same manner as the previous set of experiments, from
one to four times the number of nodes in the cluster. Section 6 contains results
for this third set of experiments.
{<b,f,t>| b ∈
[0
,
24]
,f ∈
[0
,
24]
,t∈
[0
,
12]; 2
b,
2
f,
2
t ∈
N ;
b
+
f
+
t
=24
}
2 We do not use the Fastest node configuration for this set of experiments.
Search WWH ::




Custom Search