Database Reference
In-Depth Information
misconfiguration. Meanwhile, launching too many speculative tasks may take away
resources from useful tasks. Therefore, the accuracy in estimating the progress and
time remaining long-running jobs is an important challenge for a runtime environ-
ment like the MapReduce framework. In particular, this information can play an
important role in improving resource allocation, enhancing the task scheduling,
enabling query debugging or tuning the cluster configuration. The ParaTimer system
[103,104] has been proposed to tackle this challenge. In particular, ParaTimer pro-
vides techniques for handling several challenges including failures and data skew. To
handle unexpected changes in query execution times such as those due to failures,
ParaTimer provides users with a set of time-remaining estimates that correspond
to the predicted query execution times in different scenarios (i.e., a single worst-
case failure or data skew at an operator). Each of these indicators can be annotated
with the scenario to which it corresponds, giving users a detailed picture of pos-
sible expected behaviors. To achieve this goal, ParaTimer estimates time remain-
ing by breaking queries into pipelines where the time remaining for each pipeline
is estimated by considering the work to be done and the speed at which that work
will be performed, taking (time-varying) parallelism into account. To get process-
ing speeds, ParaTimer relies on earlier debug runs of the same query on input data
samples generated by the user. In addition, ParaTimer identifies the critical path in
a query plan where it then estimates progress along that path, effectively ignoring
other paths. Zaharia et al. [137] have presented an approach to estimate the progress
of MapReduce tasks within environments of clusters with heterogeneous hardware
configurations. In these environments, choosing the node on which to run a specula-
tive task is as important as choosing the task. They proposed an algorithm for specu-
lative execution called LATE (Longest Approximate Time to End) which is based
on three principles: prioritizing tasks to speculate, selecting fast nodes on which to
run, and capping speculative tasks to prevent thrashing. In particular, the algorithm
speculatively execute the task that it suspects will finish farthest into the future,
because this task provides the greatest opportunity for a speculative copy to overtake
the original and reduce the job's response time. To get the best chance of beating the
original task with the speculative task, the algorithm only launches speculative tasks
on fast nodes (and not the first available node). The RAFT (Recovery Algorithms
for Fast-Tracking) system [115,116] has been introduced, as a part of the Hadoop++
system [47], for tracking and recovering MapReduce jobs under task or node failures.
In particular, RAFT uses two main checkpointing mechanisms: local checkpointing
and query metadata checkpointing . On the one hand, the main idea of local check-
pointing is to utilize intermediate results, which are by default persisted by Hadoop,
as checkpoints of ongoing task progress computation. In general, map tasks spill
buffered intermediate results to local disk whenever the output buffer is on the verge
to overflow. RAFT exploits this spilling phase to piggyback checkpointing metadata
on the latest spill of each map task. For each checkpoint, RAFT stores a triplet of
metadata that includes the tasked , which represents a unique task identifier, spillID ,
which represents the local path to the spilled data and offset , which specifies the last
byte of input data that was processed in that spill. To recover from a task failure, the
RAFT scheduler reallocates the failed task to the same node that was running the
task. Then, the node resumes the task from the last checkpoint and reuses the spills
Search WWH ::




Custom Search