Database Reference
In-Depth Information
Performance Analysis of Adapting a MapReduce
Framework to Dynamically Accommodate
Heterogeneity
B
Jessica Hartog (
) , Renan DelValle, Madhusudhan Govindaraju,
and Michael J. Lewis
Department of Computer Science,
State University of New York (SUNY) at Binghamton,
Binghamton, NY 13902, USA
{ jhartog1,rdelval1,mgovinda,mlewis } @binghamton.edu
http://www.cs.binghamton.edu
Abstract. When data centers employ the common and economical
practice of upgrading subsets of nodes incrementally, rather than replac-
ing or upgrading all nodes at once, they end up with clusters whose nodes
have non-uniform processing capability , which we also call performance-
heterogeneity . Popular frameworks supporting the effective MapReduce
programming model for Big Data applications do not flexibly adapt to
these environments. Instead, existing MapReduce frameworks, including
Hadoop, typically divide data evenly among worker nodes, thereby induc-
ing the well-known problem of stragglers on slower nodes. Our alternative
MapReduce framework, called MARLA, divides each worker's labor into
sub-tasks, delays the binding of data to worker processes, and thereby
enables applications to run faster in performance-heterogeneous environ-
ments. This approach does introduce overhead, however. We explore and
characterize the opportunity for performance gains, and identify when
the benefits outweigh the costs. Our results suggest that frameworks
should support finer grained sub-tasking and dynamic data partitioning
when running on some performance-heterogeneous clusters. Blindly tak-
ing this approach in homogeneous clusters can slow applications down.
Our study further suggests the opportunity for cluster managers to build
performance-heterogeneous clusters by design, if they also run MapRe-
duce frameworks that can exploit them.
1
Introduction
Scientists continue to develop applications that generate, process, and ana-
lyze large amounts of data. The MapReduce programming model helps express
operations on Big Data. The model and its associated framework implementa-
tions, including Hadoop [ 1 ], successfully support applications such as genome
sequencing in bioinformatics [ 2 , 3 ], and catalog indexing of celestial objects in
This work was supported in part by NSF grant CNS-0958501.
 
 
Search WWH ::




Custom Search