Database Reference
In-Depth Information
starting the iterative process from scratch. Since the state data are relatively small, it
is expected to consume little time for dumping these data to DFS (i.e., make several
copies on several other machines for data redundancy). Note that the checkpointing
process is performed in parallel with the iterative processing.
3.3.5 l oaD b alanCing
In MapReduce, the master decomposes a submitted job into multiple tasks. The slave
worker completes one task followed by requesting another one from the master. This
“complete-and-then-feed” task scheduling mechanism makes good use of computing
resources. In iMapReduce, all tasks are assigned to slave workers in the beginning at
one time, since tasks are persistent in iMapReduce. This one-time assignment con-
flicts with MapReduce's task scheduling strategy, so that iMapReduce cannot confer
the benefit from the original MapReduce framework.
Lack of load balancing support may lead to several problems: (1) Even though
the initial input data are evenly partitioned among workers, it does not necessarily
mean that the computation workload is evenly distributed due to the skewed degree
distribution. (2) Even though the computation workload is evenly distributed among
workers, it still cannot guarantee the best utilization of computing resources, since a
large cluster might consist of heterogeneous servers [19].
To address these problems, iMapReduce performs task migration when the work-
load is unbalanced among workers. After each iteration, each reduce task sends an
iteration completion report to the master, which contains the reduce task id, the itera-
tion number, and the processing time of that iteration. Upon receipt of all the reduce
tasks' reports, the master calculates the average processing time. Based on the aver-
age time, the master calculates the time deviation of each worker and identifies the
slower workers and the faster workers. If the time deviation is larger than a threshold,
a MRPair in the slowest worker is migrated to the fastest worker in the following
three steps. The master (1) kills an MRPair in the slow worker, (2) launches a new
MRPair in the fast worker, and (3) sends a rollback command to the other map/
reduce tasks. All the map/reduce tasks that receive the rollback command skip their
current processing. The rolled back map tasks reload the latest checkpointed state
data from DFS and proceed. The new launched map tasks load not only the state data
but also the corresponding structure data from DFS.
3.4 ALGORITHM IMPLEMENTATION
iMapReduce supports the implementation of iterative algorithms and is compatible
with traditional Hadoop MapReduce. Users can turn on iterative processing func-
tionalities for implementing iterative algorithms or turn them off for implementing
Hadoop MapReduce jobs as usual.
3.4.1 i m aP r eDuCe aPi
To implement an iterative algorithm in iMapReduce, users should implement the
following interfaces, which is slightly changed from Hadoop API (written in Java):
Search WWH ::




Custom Search