Database Reference
In-Depth Information
previously produced for the same task. This simulates a situation where previous spills
appear as if they were just produced by the task. In case that there is no local checkpoint
available, the node recomputes the task from the beginning. On the other hand, the idea
behind query metadata checkpointing is to push intermediate results to reducers as soon
as map tasks are completed and to keep track of those incoming key-value pairs that
produce local partitions and hence that are not shipped to another node for processing.
Therefore, in case of a node failure, the RAFT scheduler can recompute local partitions.
In general, energy consumption and cooling are large components of the opera-
tional cost of datacenters [14]. Therefore, the cluster-level energy management of
MapReduce framework is another interesting system optimization aspect. In prin-
ciple, the energy efficiency of a cluster can be improved in two ways [89]:
1. By matching the number of active nodes to the current needs of the work-
load and placing the remaining nodes in low-power standby modes.
2. By engineering the compute and storage features of each node to match its
workload and avoid energy wastage due to oversized components.
Lang and Patel [86] have investigated the approach to power down (and power
up) nodes of a MapReduce cluster to save energy during periods of low utilization.
In particular, they compared between the following two strategies for MapReduce
energy management:
1. Covering Set (CS) strategy that keeps only a small fraction of the nodes
powered up during periods of low utilization.
2. All-In Strategy (AIS) that uses all the nodes in the cluster to run a workload
and then powers down the entire cluster.
The results from this comparison show that there are two crucial factors that
affect the effectiveness of these two methods:
The computational complexity of the workload.
The time taken to transition nodes to and from a low power (deep hiberna-
tion) state to a high performance state.
The evaluation shows that CS is more effective than AIS only when the compu-
tational complexity of the workload is low (e.g., linear), and the time it takes for the
hardware to transition a node to and from a low-power state is a relatively large frac-
tion of the overall workload time (i.e., the workload execution time is small). In all
other cases, the AIS shows better performance over CS in terms of energy savings
and response time performance.
2.4 SYSTEMS OF DECLARATIVE INTERFACES
FOR THE MapReduce FRAMEWORK
For programmers, a key appealing feature in the MapReduce framework is that there
are only two main high-level declarative primitives ( map and reduce ) that can be
Search WWH ::




Custom Search