Database Reference
In-Depth Information
Therefore, the main job of map tasks in Incoop is to implement task-level memoiza-
tion. To do this, after a map task runs, we store its results persistently (instead of
discarding them after the job execution) and insert a corresponding reference to the
result in the memoization server.
During incremental runs, map tasks query the memoization server to determine
if their output has already been computed. If so, they output the location of the
memoized result, and conclude. Figure 4.3 illustrates this process: part (a) describes
the initial run and part (b) describes the incremental run where chunk 2 is modi-
fied (and replaced by chunk 4) and the map tasks for chunks 1 and 3 can reuse the
memoized results.
Incremental reduce. The reduce task processes the output of the map phase:
each reduce task has an associated key k , collects all the key-value pairs generated
by all map tasks for k , and applies the reduce function. For efficiency, we apply two
levels of memoization in this case. First, we memoize the inputs and outputs of the
entire reduce task to try to reuse these results in a single step. Second, we break down
the reduce phase into a contraction phase followed by a smaller invocation of the
reduce function to address the stability issues we discussed.
The first level of memoization is very similar to that of map tasks: the memoiza-
tion server maintains a mapping from a hash of the input to the location of the result
of the reduce task. A minor difference is that a reduce task receives input from
several map tasks, and as such the key of that mapping is the concatenation of the
collision-resistant hashes from all these outputs. For the reduce task to compute this
key, instead of immediately copying the output from all map tasks, it fetches the
hashes only to determine if the reduce task can be skipped entirely. Only if this is
not the case the data is transferred from map to reduce tasks.
As we mentioned, this first level has the limitation that small changes in the input
cause the entire reduce task to be re-executed, which can result in work that is linear
(a)
(b)
Chunk-1
Chunk-2
Chunk-3
Chunk-1
Chunk-4
Chunk-3
Execute job
Execute job
Job
tracker
Job
tracker
Chunk-1
Chunk-2
Chunk-3
Chunk-1
Chunk-4
Chunk-3
Map-1
Map-2
Map-3
Map-1
Map-2
Map-3
FIGURE 4.3
Incremental map tasks. (a) Initial run. (b) Incremental run.
Search WWH ::




Custom Search