Database Reference
In-Depth Information
can be implemented in a single job. Moreover, the map operation of the next iteration
can start without the synchronization barriers between jobs. Thus, Goal 1 and Goal 2
are achieved. Additionally, to achieve Goal 3, the iterated state data is separated from
the static structure data. The read-only structure data is queried in each iteration but
never changed, while the state data is updated in each iteration. Correspondingly,
two data flows, the state data flow (composed of state KVs) and the structure data
flow (composed of structure KVs), are existing in iMapReduce.
Except for the user defined functions (UDF) map and reduce, users have to imple-
ment join in iMapReduce. The UDF join is used for users to specify the mapping
rules between the reducers and the mappers, based on which iMapReduce combines
the state data flow and the structure data flow before map operation.
3.3 SYSTEM DESIGN
iMapReduce is designed and implemented by modifying Hadoop MapReduce. Hadoop
MapReduce framework is changed for iterative processing. The iterated state data and
the static structure data are separated with the built-in framework support. Besides, itera-
tion termination, fault tolerance, and load balancing are supported in iMapReduce.
3.3.1 o verview
Figure 3.3 shows the system overview of iMapReduce. An iMapReduce job will
launch multiple map tasks and reduce tasks. Note that, the number of map tasks and
Map task 1
Map task 2
Map task 3
Structure
data
partition 1
Structure
data
partition 2
Structure
data
partition 3
Join
Join
Join
Map
Map
Map
Shu e
KV
KV
KV
Update
Update
Update
Reduce task 1
Reduce task 2
Reduce task 3
FIGURE 3.3
System over view.
Search WWH ::




Custom Search