Database Reference
In-Depth Information
To implement persistent tasks, there should be enough available task slots . The
number of available map/reduce task slots is the number of map/reduce tasks that
the framework can accommodate (allows to be executed) simultaneously. In Hadoop
MapReduce, the master splits a job into many small map/reduce tasks, the number
of map/reduce tasks executed simultaneously cannot be larger than the number of
the available map/reduce task slots (the default number in Hadoop is 2 per slave
worker). Once a slave worker completes an assigned task, it requests another one
from the master. iMapReduce requires to guarantee that there are sufficient available
task slots for all the persistent tasks to start at the beginning. This means that the
task granularity should be set coarser to have fewer tasks. Clearly, this might make
load balancing challenging. iMapReduce addresses this issue with a load balancing
scheme, which will be introduced later.
3.3.2.2 Passing State Data from Reduce to Map
In MapReduce, the output of a reduce task is written to DFS and might be used later
in the next MapReduce job. In contrast, iMapReduce allows the state data to be
passed from the reduce task to the map task directly, so as to trigger the join opera-
tion with the structure data and to start the map execution for the next iteration. To
do so, iMapReduce builds persistent socket connections from the reduce tasks to the
map tasks.
3.3.2.3 Joining State Data with Structure Data
iMapReduce differentiates the structure data from the state data and processes them
separately. iMapReduce will combine them before using them in the map operation.
By specifying the partition of the structure data and customizing the shuffling of the
state data correspondingly, a reduce task's output state KVs and their corresponding
structure KVs are always in the same map task. In each iteration, before executing
the map operation, for each state KV, a join operation that extracts its corresponding
structure KV is performed. Then they are used together in the map operation.
iMapReduce achieves automatic join by leveraging sorted files. With the support
of MapReduce framework, the state KVs from reduce task to map task are sorted
in the order of their keys. Suppose a structure key is SK and its corresponding state
key is cord ( SK ), iMapReduce sorts the structure KVs of each structure data partition
in the order of cord ( SK ). Therefore, given the sorted state data flow, by sequentially
parsing the structure data partition file once, all the corresponding state KVs and
structure KVs are extracted. The joined state KV and structure KV are provided to
map operation as the input parameters. Users can concentrate on implementing the
map operation, without worrying about the maintenance of the structure data in their
iterative algorithm implementations.
3.3.2.4 Asynchronous Execution of Map Tasks
In MapReduce iterative algorithm implementations, there are two synchronization
barriers in each iteration; one is between mappers and reducers and another one
is between MapReduce jobs. Due to the synchronization barrier between jobs, the
map tasks of the next iteration job cannot start before the completion of the previ-
ous iteration job, which requires the completion of all reduce tasks. However, since
Search WWH ::




Custom Search