Database Reference
In-Depth Information
astroinformatics [ 4 ], by splitting data across processing nodes, applying the same
operation on each subset, and aggregating results.
When frameworks split data evenly across nodes, and when the
and
map
functions are applied uniformly, the frameworks implicitly assume that
constituent nodes possess similar processing capability. When they do not, strag-
gler processes result and performance suffers [ 5 , 6 ].
We refer to clusters whose nodes exhibit non-uniform processing capability as
being performance-heterogeneous . Performance heterogeneity can result from data
center administrators upgrading subsets of nodes incrementally, rather than
replacing or upgrading all cluster nodes at once. This can result as funds become
available incrementally, as older nodes fail or become obsolete, and as new faster
processors continue to emerge. FutureGrid [ 7 ]andNERSC[ 8 ] exemplify perfor-
mance-heterogeneous clusters. The FutureGrid test-bed is a geographically dis-
tributed set of heterogeneous nodes that vary significantly in terms of processor
speeds, number of cores, available memory, and storage technologies. NERSC's
Carver cluster includes a mix of Intel Nehalem quad-core processors, Westmere
6-core processors, and Nehalem-EX 8-core processors, for a total of 9,984 cores.
Hadoop [ 1 ], the de facto standard MapReduce framework, can perform poorly
in performance-heterogeneous environments [ 5 , 6 , 9 , 10 ]. To improve performance,
MapReduce applications, in concert with supporting frameworks, must consider
differences in processing capabilities of underlying nodes. Simply put, faster nodes
should perform more work in the same time, eliminating or greatly reducing the
need for applications to wait for straggler processes to finish [ 6 , 11 ]. Our MARLA
MapReduce framework [ 9 ] supports partitioning of labor into sub-tasks, and does
not rely on the Hadoop Distributed File System (HDFS) [ 12 ]. Instead, it uses a
standard implementation of Network File System (NFS); therefore, data need not
reside on worker nodes before a MapReduce application runs, and more capa-
ble nodes can eventually receive and process more data. MARLA therefore does
not require significant local storage space on worker nodes, but does require data
movement (via NFS or some other underlying file system) at runtime.
In this paper, we configure a cluster to exhibit varying degrees of performance-
heterogeneity, and test the effectiveness of splitting MapReduce applications with
several degrees of granularity. Using smaller sub-tasks increases the opportunity
to react to performance-heterogeneity, but also requires that the application
pause more often to wait for data to arrive. Our experiments help identify the
circumstances under which the benefits of fine-grained subtasking and delayed
data partitioning outweigh the associated costs. We vary cluster nodes to include
two and three different levels of processing capability, and configure different per-
centages of nodes at each level. For each cluster environment, we divide appli-
cation labor into different granularities of subtasks, to help identify the best
strategy for task distribution on clusters with different characteristics.
This paper makes the following contributions:
reduce
- It demonstrates how incremental upgrades of a cluster can affect perfor-
mance of MapReduce applications that do not respond to cluster performance-
heterogeneity. Application developers do not typically reap the performance
improvements that cluster providers purportedly pay for.
Search WWH ::




Custom Search