Database Reference
In-Depth Information
architecture. The task tracker not only manages task execution but also manages
caches and indices on the slave node, and redirects each task's cache and index
accesses to local file system.
In the MapReduce framework, each map or reduce task contains its portion of
the input data and the task runs by performing the map/reduce function on its input
data records where the life cycle of the task ends when finishing the processing of
all the input data records has been completed. The iMapReduce framework [ 240 ]
supports the feature of iterative processing by keeping alive each map and reduce
task during the whole iterative process. In particular, when all of the input data of a
persistent task are parsed and processed, the task becomes dormant, waiting for the
new updated input data. For a map task, it waits for the results from the reduce tasks
and is activated to work on the new input records when the required data from the
reduce tasks arrive. For the reduce tasks, they wait for the map tasks' output and are
activated synchronously as in MapReduce. Jobs can terminate their iterative process
in one of two ways:
1. Defining fixed number of iterations : Iterative algorithm stops after it iterates n
times.
2. Bounding the distance between two consecutive iterations : Iterative algorithm
stops when the distance is less than a threshold.
The iMapReduce runtime system does the termination check after each iteration.
To terminate the iterations by a fixed number of iterations, the persistent map/reduce
task records its iteration number and terminates itself when the number exceeds a
threshold. To bound the distance between the output from two consecutive iterations,
the reduce tasks can save the output from two consecutive iterations and compute
the distance. If the termination condition is satisfied, the master will notify all the
map and reduce tasks to terminate their execution.
Other projects have been implemented for supporting iterative processing on the
MapReduce framework. For example, Twister [ 50 ] is a MapReduce runtime with
an extended programming model that supports iterative MapReduce computations
efficiently [ 125 ]. It uses a publish/subscribe messaging infrastructure for communi-
cation and data transfers, and supports long running map/reduce tasks. In particular,
it provides programming extensions to MapReduce with broadcast and scatter type
data transfers. Microsoft has also developed a project that provides an iterative
MapReduce runtime for Windows Azure called Daytona [ 37 ].
Data and Process Sharing
With the emergence of cloud computing, the use of an analytical query processing
infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value.
Taking into account that different MapReduce jobs can perform similar work, there
could be many opportunities for sharing the execution of their work. Thus, this
sharing can reduce the overall amount of work which consequently leads to the
Search WWH ::




Custom Search