Database Reference
In-Depth Information
map task. For each checkpoint, RAFT stores a triplet of metadata that includes the
taskID which represents a unique task identifier, spillID which represents the local
path to the spilled data and offset which specifies the last byte of input data that
was processed in that spill. To recover from a task failure, the RAFT scheduler
reallocates the failed task to the same node that was running the task. Then, the node
resumes the task from the last checkpoint and reuses the spills previously produced
for the same task. This simulates a situation where previous spills appear as if they
were just produced by the task. In case that there is no local checkpoint available, the
node recomputes the task from the beginning. On the other hand, the idea behind
query metadata checkpointing is to push intermediate results to reducers as soon
as map tasks are completed and to keep track of those incoming key-value pairs
that produce local partitions and hence that are not shipped to another node for
processing. Therefore, in case of a node failure, the RAFT scheduler can recompute
local partitions.
In general, energy consumption and cooling are large components of the
operational cost of datacenters [ 74 ]. Therefore, the cluster-level energy management
of MapReduce framework is another interesting system optimization aspect. In
principle, the energy efficiency of a cluster can be improved in two ways [ 174 ]:
1. By matching the number of active nodes to the current needs of the workload and
placing the remaining nodes in low-power standby modes.
2. By engineering the compute and storage features of each node to match its
workload and avoid energy wastage due to oversized components.
Lang and Patel [ 169 ] have investigated the approach to power down (and power
up) nodes of a MapReduce cluster in order to save energy during periods of low
utilization. In particular, they compared between the following two strategies for
MapReduce energy management:
1. Covering Set (CS) strategy that keeps only a small fraction of the nodes powered
up during periods of low utilization.
2. All-In Strategy (AIS) that uses all the nodes in the cluster to run a workload and
then powers down the entire cluster.
The results from this comparison show that there are two crucial factors that
affect the effectiveness of these two methods:
￿
The computational complexity of the workload.
￿
The time taken to transition nodes to and from a low power (deep hibernation)
state to a high performance state.
The evaluation shows that CS is more effective than AIS only when the
computational complexity of the workload is low (e.g., linear), and the time it takes
for the hardware to transition a node to and from a low power state is a relatively
large fraction of the overall workload time (i.e., the workload execution time is
small). In all other cases, the AIS shows better performance over CS in terms of
energy savings and response time performance.
Search WWH ::




Custom Search