Database Reference
In-Depth Information
emp-id
emp-info: bonus
dept-id
B
A
dept-info: bonus adjustment
1.1
0.9
emp-info: dept-id
B Innovation award ($100)
Hard worker award ($50)
NULL ($0)
High-performer ($150)
Innovation award ($100)
LHS mapper computes emp bonuses
1
1
2
3
3
B
A
RHS mapper retrieves Bonus adjustment
A
A
dept-id
Bonus adjustment
B
A
1.15
0.95
emp-id
Bonus
dept-id
1
B
$100
$50
$0
$150
$100
RHS reducer modified bonus adjustment and
sorts on dept-id
1
B
2
A
dept-id
Bonus adjustment
3
A
A
A
B
0.95
3
1.15
LHS reducer sorts on (dept-id, emp-id)
pair and sums up emp bonuses
dept-id
match keys on dept-id
emp-id
bonus-sum
2
A
$0
$250
$150
3
A
emp-id
Bonus
A sort-merge merger joins LHS and
RHS reduced outputs, then
computes final emp bonuses.
1
B
2
3
$0
$237.5
$172.5
1
FIGURE 2.5 A sample execution of the map-reduce-merge framework. (From H. C.
Yang et al., Map-reduce-merge: Simplified relational data processing on large clusters, in
SIGMOD , pp. 1029-1040, 2007.)
recursively, select data partitions based on query conditions, and feed only selected
partitions to other primitives.
The map-join-reduce [76] represents another approach that has been introduced
with a filtering-join-aggregation programming model as an extension of the standard
MapReduce's filtering-aggregation programming model. In particular, in addition to
the standard mapper and reducer operation of the standard MapReduce framework,
they introduce a third operation, join (called joiner), to the framework. Hence, to join
multiple data sets for aggregation, users specify a set of join () functions and the join
order between them. Then, the runtime system automatically joins the multiple input
data sets according to the join order and invoke join () functions to process the joined
records. They have also introduced a one-to-many shuffling strategy that shuffles
each intermediate key/value pair to many joiners at one time. Using a tailored parti-
tion strategy, they can utilize the one-to-many shuffling scheme to join multiple data
sets in one phase instead of a sequence of MapReduce jobs. The runtime system for
executing a map-join-reduce job launches two kinds of processes: MapTask and
ReduceTask . Mappers run inside the MapTask process, whereas joiners and reducers
are invoked inside the ReduceTask process. Therefore, map-join-reduce's process
model allows for the pipelining of intermediate results between joiners and reducers
since joiners and reducers are run inside the same ReduceTask process.
Search WWH ::




Custom Search