Incremental MapReduce Computations - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

with similar inputs, many tasks are repeated and their results can be reused. To

define stability more precisely, consider performing MapReduce computations with

inputs I and I ′ and consider the respective set of tasks that are executed, denoted

T and T ′. We say that a task t ∈ T ′ is not matched if t ∉ T , that is, the task that is

performed with input I ′ is not performed with the input I . We say that a MapReduce

computation is stable if the time required to execute the unmatched tasks is small,

where small can be more precisely defined as sublinear in the size of the input.

In the case of MapReduce, stability can be affected by several factors, which

we can group into the following two categories: (a) making a small change to the

input can change the input to many tasks, causing these tasks to become unmatched;

(b) even if a small number of tasks is unmatched, these tasks can take a long time

to execute or to transfer possibly reused data. To address these issues, we introduce

techniques for (1) performing a stable input partitioning; (2) controlling the granular-

ity and stability of both Map and Reduce tasks; and (3) finding efficient scheduling

mechanisms to avoid unnecessary movement of memoized data.

Stable input partitioning. To see why using HDFS as an input to MapReduce

jobs leads to unstable computations, consider inserting a single data item in the mid-

dle of an input file. Since HDFS files are partitioned into fixed-sized chunks, this

small change will shift each partition point following the input change by a fixed

amount. If this amount is not a multiple of the chunk size, all subsequent map tasks

will be unmatched. (On average, a single insert will affect half of all map tasks.) The

problem gets even more challenging when we consider more complex changes, like

the order of records being permuted; such changes can be common, for instance, if

a crawler uses a depth-first strategy to crawl the web, and a single link change can

move the position of an entire subtree in the input file. In this case, using standard

algorithms to compute the differences between the two input files is not viable, since

this would require running a polynomial-time algorithm (e.g., an edit-distance algo-

rithm). We explain how our new file system called Inc-HDFS leads to stable input

partitioning without compromising efficiency in Section 4.3.

Granularity control. A stable partitioning leads directly to the stability of map

tasks. The input to the reduce tasks, however, is determined only by the outputs

of the map tasks, since each reduce task processes all values produced in the map

phase and associated with a given key. Consider, for instance, the case when a single

key-value pair is added to a reduce task that processes a large number of values (e.g.,

linear in the size of the input). This is problematic since it causes the entire task to be

recomputed. Furthermore, even if we found a way of dividing large reduce tasks into

multiple smaller tasks, this per se would not solve the problem, since we still need to

aggregate the results of the smaller tasks in a way that avoids a large recomputation.

Thus, we need a way to (i) split the reduce task into smaller tasks and (ii) eliminate

potentially long (namely linear-size) dependencies between these smaller tasks. We

solve this problem with a new contraction phase, where reduce tasks are broken into

subtasks organized in a tree. This breaks up the reduce task while ensuring that long

dependencies between tasks are not formed, since all paths in the tree will be of

logarithmic length. Section 4.4 describes our proposed approach.

Scheduling. To avoid a large movement of memoized data, it is important to

schedule a task on the machine that stores the memoized results that are being

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home