Database Reference
In-Depth Information
with similar inputs, many tasks are repeated and their results can be reused. To
define stability more precisely, consider performing MapReduce computations with
inputs I and I ′ and consider the respective set of tasks that are executed, denoted
T and T ′. We say that a task t T ′ is not matched if t T , that is, the task that is
performed with input I ′ is not performed with the input I . We say that a MapReduce
computation is stable if the time required to execute the unmatched tasks is small,
where small can be more precisely defined as sublinear in the size of the input.
In the case of MapReduce, stability can be affected by several factors, which
we can group into the following two categories: (a) making a small change to the
input can change the input to many tasks, causing these tasks to become unmatched;
(b) even if a small number of tasks is unmatched, these tasks can take a long time
to execute or to transfer possibly reused data. To address these issues, we introduce
techniques for (1) performing a stable input partitioning; (2) controlling the granular-
ity and stability of both Map and Reduce tasks; and (3) finding efficient scheduling
mechanisms to avoid unnecessary movement of memoized data.
Stable input partitioning. To see why using HDFS as an input to MapReduce
jobs leads to unstable computations, consider inserting a single data item in the mid-
dle of an input file. Since HDFS files are partitioned into fixed-sized chunks, this
small change will shift each partition point following the input change by a fixed
amount. If this amount is not a multiple of the chunk size, all subsequent map tasks
will be unmatched. (On average, a single insert will affect half of all map tasks.) The
problem gets even more challenging when we consider more complex changes, like
the order of records being permuted; such changes can be common, for instance, if
a crawler uses a depth-first strategy to crawl the web, and a single link change can
move the position of an entire subtree in the input file. In this case, using standard
algorithms to compute the differences between the two input files is not viable, since
this would require running a polynomial-time algorithm (e.g., an edit-distance algo-
rithm). We explain how our new file system called Inc-HDFS leads to stable input
partitioning without compromising efficiency in Section 4.3.
Granularity control. A stable partitioning leads directly to the stability of map
tasks. The input to the reduce tasks, however, is determined only by the outputs
of the map tasks, since each reduce task processes all values produced in the map
phase and associated with a given key. Consider, for instance, the case when a single
key-value pair is added to a reduce task that processes a large number of values (e.g.,
linear in the size of the input). This is problematic since it causes the entire task to be
recomputed. Furthermore, even if we found a way of dividing large reduce tasks into
multiple smaller tasks, this per se would not solve the problem, since we still need to
aggregate the results of the smaller tasks in a way that avoids a large recomputation.
Thus, we need a way to (i) split the reduce task into smaller tasks and (ii) eliminate
potentially long (namely linear-size) dependencies between these smaller tasks. We
solve this problem with a new contraction phase, where reduce tasks are broken into
subtasks organized in a tree. This breaks up the reduce task while ensuring that long
dependencies between tasks are not formed, since all paths in the tree will be of
logarithmic length. Section 4.4 describes our proposed approach.
Scheduling. To avoid a large movement of memoized data, it is important to
schedule a task on the machine that stores the memoized results that are being
Search WWH ::




Custom Search