Database Reference
In-Depth Information
emp-id
emp-info: dept-id
emp-info: bonus
dept-id
dept-info: bonus adjustment
1
B
innovation award ($100)
1.1
0.9
B
A
1
2
3
3
B
hard worker award ($50)
A
NULL ($0)
RHS mapper retrieves bonus adjustment
A
high-performer ($150)
A
Innovation award ($100)
dept-id
bonus adjustment
1.15
LHS mapper computes emp bonuses
B
0.95
A
emp-id
dept-id
bonus
1
B
$100
RHS reducer modified bonus adjustment and
sorts on dept-id
$50
$0
1
2
B
A
A
A
dept-id
bonus adjustment
3
3
$150
$100
A
0.95
1.15
B
LHS reducer sorts on (dept-id, emp-id)
pair and sums up emp bonuses
match keys on dept-id
emp-id
dept-id
bonus-sum
2
A
A
B
$0
3
1
$250
$150
emp-id
bonus
A sort-merge merger joins LHS and
RHS reduced outputs, then
computes final emp bonuses.
$0
$237.5
$172.5
2
3
1
Fig. 9.5
A sample execution of the Map-Reduce-Merge framework
Afrati and Ullman [ 60 , 61 ] have presented another approach to improve the
join phase in the MapReduce framework. The approach aims to optimize the
communication cost by focusing on selecting the most appropriate attributes that
are used to partition and replicate the data among the reduce process. Therefore,
it begins by identifying the map-key , the set of attributes that identify the Reduce
process to which a Map process must send a particular tuple. Each attribute of the
map-key gets a “ share ” which is the number of buckets into which its values are
hashed, to form a component of the identifier of a Reduce process. Relations have
their tuples replicated in limited fashion of which the degree of replication depends
on the shares for those map-key attributes that are missing from their schema.
The approach considers two important special join cases: chain joins (represents
a sequence of two-way join operations where the output of one operation in this
sequence is used as an input to another operation in a pipelined fashion) and star
joins (represents joining of a large fact table with several smaller dimension tables).
In each case, the proposed algorithm is able to determine the map-key and determine
the shares that yield the least replication. The proposed approach is not always
Search WWH ::




Custom Search