Performance Analysis of Adapting a MapReduce Framework to Dynamically Accommodate Heterogeneity - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

The degree of performance-heterogeneity in a cluster influences MapReduce

application performance. Some MapReduce frameworks can use relatively few

upgraded nodes for straggler mitigation and improved performance. But not

all upgrades influence performance equally. For example, applications that may

benefit significantly from upgrades to the first 25 % of nodes, may see no fur-

ther improvements in upgrades of an additional 25 % and even 50 % of nodes.

MARLA's fine grained splitting of jobs into a larger number of smaller tasks, and

further splitting each task into one sub-task per core (on the cluster node with

the most cores) yields the best results for clusters with the most performance-

heterogeneity. For homogeneous clusters, however, having many tasks and sub-

tasks introduces overhead to tackle a straggler problem that is less pronounced.

Clusters with as few as three different classes of nodes can exhibit particular

configurations that support significantly improved performance, but not every

upgrade automatically leads to requisite performance gains.

8 Future Work

Our future work will encompass many facets of MapReduce and the processing of

large pieces of data. In addition to this work, we will run further experiments to

determine how the results presented here apply to other classes of applications.

We will confirm this within both heterogeneous and homogeneous settings for

memory, storage, and the network interconnect between nodes. Once we have

explored many classes of application, we intend to use all of the information

collected to define a mathematical model that will help determine the optimal

data-split configuration for a static cluster and a given class of workload. Devel-

opment of such a model will encourage use of MARLA in data centers and

HPC environments with centralized data stores who currently cannot get the

full benefit of Hadoop due to its early binding of tasks.

Another future direction is to use the insights gathered from this work

towards achieving energy eciency with respect to our MapReduce framework

in a heterogeneous, non-dedicated cluster. Our future work seeks to develop

an ecient MapReduce framework that can dynamically assess the energy con-

sumption of worker nodes. We believe that our work should not require nodes to

be outfitted with expensive power meters as such a requirement will make this

framework impractical for many potential consumers. The results discussed in

this work will lead us toward development of an energy-aware, ecient, elastic,

dynamic MapReduce framework that can be deployed on any number of nodes.

References

1. Apache Hadoop. http://hadoop.apache.org

2. 1000 Genomes: A Deep Catalog of Human Genetic Variation. http://www.

1000genomes.org

Search WWH ::

Custom Search

Home