iMapReduce - Large Scale and Big Data: Processing and Management - page 112

Database Reference

In-Depth Information

their input data are available. This limitation results in the unnecessary

synchronization overhead .

3. MapReduce framework cannot separate the state data from the structure

data. The structure data is shuffled in each iteration between map and

reduce, despite the fact that it remains the same across all iterations. This

results in the unnecessary communication overhead .

The synchronization overhead and the static data communication overhead have been

studied and measured in [21]. iMapReduce aims to address these limitations and provides

an efficient distributed computing framework for implementing iterative algorithms.

3.2 PROGRAMMING MODEL

Based on the observations and analysis, we are aware of the limitations of implementing

iterative algorithms in Hadoop MapReduce. To overcome these performance penalties,

iMapReduce is proposed. The design goals of iMapReduce are as follows:

•

Goal 1: Supporting iterative processing in one job. In the MapReduce imple-

mentations, a series of MapReduce jobs consisting of map tasks and reduce

tasks are scheduled. Figure 3.1a shows the data flow in the MapReduce imple-

mentation. Each MapReduce job has to load the input data from DFS before

the map operation and dump the output data to DFS after the reduce operation.

In the next iteration, the map function loads the iterated data from DFS again

(a)

(b)

DFS

DFS

Map

Job1

Map

Reduce

Job

DFS

Job2

Reduce

Map

DFS

MapReduce

dataflow

DFS

iMapReduce

dataflow

FIGURE 3.1

Data flow of (a) MapReduce and (b) iMapReduce.

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home