iMapReduce - Large Scale and Big Data: Processing and Management - page 114

Database Reference

In-Depth Information

can be implemented in a single job. Moreover, the map operation of the next iteration

can start without the synchronization barriers between jobs. Thus, Goal 1 and Goal 2

are achieved. Additionally, to achieve Goal 3, the iterated state data is separated from

the static structure data. The read-only structure data is queried in each iteration but

never changed, while the state data is updated in each iteration. Correspondingly,

two data flows, the state data flow (composed of state KVs) and the structure data

flow (composed of structure KVs), are existing in iMapReduce.

Except for the user defined functions (UDF) map and reduce, users have to imple-

ment join in iMapReduce. The UDF join is used for users to specify the mapping

rules between the reducers and the mappers, based on which iMapReduce combines

the state data flow and the structure data flow before map operation.

3.3 SYSTEM DESIGN

iMapReduce is designed and implemented by modifying Hadoop MapReduce. Hadoop

MapReduce framework is changed for iterative processing. The iterated state data and

the static structure data are separated with the built-in framework support. Besides, itera-

tion termination, fault tolerance, and load balancing are supported in iMapReduce.

3.3.1 o verview

Figure 3.3 shows the system overview of iMapReduce. An iMapReduce job will

launch multiple map tasks and reduce tasks. Note that, the number of map tasks and

Map task 1

Map task 2

Map task 3

Structure

data

partition 1

Structure

data

partition 2

Structure

data

partition 3

Join

Join

Join

Map

Map

Map

Shu e

KV

KV

KV

Update

Update

Update

Reduce task 1

Reduce task 2

Reduce task 3

FIGURE 3.3

System over view.

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home