Database Reference
In-Depth Information
The degree of performance-heterogeneity in a cluster influences MapReduce
application performance. Some MapReduce frameworks can use relatively few
upgraded nodes for straggler mitigation and improved performance. But not
all upgrades influence performance equally. For example, applications that may
benefit significantly from upgrades to the first 25 % of nodes, may see no fur-
ther improvements in upgrades of an additional 25 % and even 50 % of nodes.
MARLA's fine grained splitting of jobs into a larger number of smaller tasks, and
further splitting each task into one sub-task per core (on the cluster node with
the most cores) yields the best results for clusters with the most performance-
heterogeneity. For homogeneous clusters, however, having many tasks and sub-
tasks introduces overhead to tackle a straggler problem that is less pronounced.
Clusters with as few as three different classes of nodes can exhibit particular
configurations that support significantly improved performance, but not every
upgrade automatically leads to requisite performance gains.
8 Future Work
Our future work will encompass many facets of MapReduce and the processing of
large pieces of data. In addition to this work, we will run further experiments to
determine how the results presented here apply to other classes of applications.
We will confirm this within both heterogeneous and homogeneous settings for
memory, storage, and the network interconnect between nodes. Once we have
explored many classes of application, we intend to use all of the information
collected to define a mathematical model that will help determine the optimal
data-split configuration for a static cluster and a given class of workload. Devel-
opment of such a model will encourage use of MARLA in data centers and
HPC environments with centralized data stores who currently cannot get the
full benefit of Hadoop due to its early binding of tasks.
Another future direction is to use the insights gathered from this work
towards achieving energy eciency with respect to our MapReduce framework
in a heterogeneous, non-dedicated cluster. Our future work seeks to develop
an ecient MapReduce framework that can dynamically assess the energy con-
sumption of worker nodes. We believe that our work should not require nodes to
be outfitted with expensive power meters as such a requirement will make this
framework impractical for many potential consumers. The results discussed in
this work will lead us toward development of an energy-aware, ecient, elastic,
dynamic MapReduce framework that can be deployed on any number of nodes.
References
1. Apache Hadoop. http://hadoop.apache.org
2. 1000 Genomes: A Deep Catalog of Human Genetic Variation. http://www.
1000genomes.org
Search WWH ::




Custom Search