Big Data Processing Systems - Cloud Data Management

Database Reference

In-Depth Information

map task. For each checkpoint, RAFT stores a triplet of metadata that includes the

taskID which represents a unique task identifier, spillID which represents the local

path to the spilled data and offset which specifies the last byte of input data that

was processed in that spill. To recover from a task failure, the RAFT scheduler

reallocates the failed task to the same node that was running the task. Then, the node

resumes the task from the last checkpoint and reuses the spills previously produced

for the same task. This simulates a situation where previous spills appear as if they

were just produced by the task. In case that there is no local checkpoint available, the

node recomputes the task from the beginning. On the other hand, the idea behind

query metadata checkpointing is to push intermediate results to reducers as soon

as map tasks are completed and to keep track of those incoming key-value pairs

that produce local partitions and hence that are not shipped to another node for

processing. Therefore, in case of a node failure, the RAFT scheduler can recompute

local partitions.

In general, energy consumption and cooling are large components of the

operational cost of datacenters [ 74 ]. Therefore, the cluster-level energy management

of MapReduce framework is another interesting system optimization aspect. In

principle, the energy efficiency of a cluster can be improved in two ways [ 174 ]:

1. By matching the number of active nodes to the current needs of the workload and

placing the remaining nodes in low-power standby modes.

2. By engineering the compute and storage features of each node to match its

workload and avoid energy wastage due to oversized components.

Lang and Patel [ 169 ] have investigated the approach to power down (and power

up) nodes of a MapReduce cluster in order to save energy during periods of low

utilization. In particular, they compared between the following two strategies for

MapReduce energy management:

1. Covering Set (CS) strategy that keeps only a small fraction of the nodes powered

up during periods of low utilization.

2. All-In Strategy (AIS) that uses all the nodes in the cluster to run a workload and

then powers down the entire cluster.

The results from this comparison show that there are two crucial factors that

affect the effectiveness of these two methods:

The computational complexity of the workload.

The time taken to transition nodes to and from a low power (deep hibernation)

state to a high performance state.

The evaluation shows that CS is more effective than AIS only when the

computational complexity of the workload is low (e.g., linear), and the time it takes

for the hardware to transition a node to and from a low power state is a relatively

large fraction of the overall workload time (i.e., the workload execution time is

small). In all other cases, the AIS shows better performance over CS in terms of

energy savings and response time performance.

Search WWH ::

Custom Search

Home