Database Reference
In-Depth Information
analysis applications, HaLoop has incorporated the following changes to the basic
Hadoop MapReduce framework:
It exposes a new application programming interface to users that simplifies
the expression of iterative MapReduce programs.
HaLoop's master node contains a new loop control module that repeat-
edly starts new map-reduce steps that compose the loop body until a user-
specified stopping condition is met.
It uses a new task scheduler that leverages data locality.
It caches and indices application data on slave nodes. In principle, the task
tracker not only manages task execution but also manages caches and indi-
ces on the slave node and redirects each task's cache and index accesses to
local file system.
In principle, HaLoop relies on the same file system and has the same task queue struc-
ture as Hadoop but the task scheduler and task tracker modules are modified, and the
loop control, caching, and indexing modules are newly introduced to the architecture.
The task tracker not only manages task execution but also manages caches and indices on
the slave node, and redirects each task's cache and index accesses to the local file system.
In the MapReduce framework, each map or reduce task contains its portion of the
input data and the task runs by performing the map/reduce function on its input data
records where the life cycle of the task ends when finishing the processing of all the input
data records has been completed. The iMapReduce framework [138] supports the feature
of iterative processing by keeping alive each map and reduce task during the whole itera-
tive process. In particular, when all of the input data of a persistent task are parsed and
processed, the task becomes dormant, waiting for the new updated input data. For a map
task, it waits for the results from the reduce tasks and is activated to work on the new
input records when the required data from the reduce tasks arrive. For the reduce tasks,
they wait for the map tasks' output and are activated synchronously as in MapReduce.
Jobs can terminate their iterative process in one of two ways:
1. Defining fixed number of iterations : Iterative algorithm stops after it iter-
ates n times.
2. Bounding the distance between two consecutive iterations : Iterative algo-
rithm stops when the distance is less than a threshold.
The iMapReduce runtime system does the termination check after each iteration.
To terminate the iterations by a fixed number of iterations, the persistent map/reduce
task records its iteration number and terminates itself when the number exceeds a
threshold. To bound the distance between the output from two consecutive iterations,
the reduce tasks can save the output from two consecutive iterations and compute the
distance. If the termination condition is satisfied, the master will notify all the map
and reduce tasks to terminate their execution.
Other projects have been implemented for supporting iterative processing on the
MapReduce framework. For example, Twister* is a MapReduce runtime with an
* http://www.iterativemapreduce.org/.
Search WWH ::




Custom Search