Database Reference
In-Depth Information
We are more interested in the relationship among the total time T , the input data
size M × b , the user defined number of reduce processes R , and the number of map
and reduce slots, m and r .
This general representation can be slightly simplified with a number of settings.
As we discussed, it is safe to assume R = r , as running Reduces in multiple rounds
ΦΦ . To make it more convenient to manipu-
late the equation, we also remove ⌈⌉ from ⌈ M / m ⌉ by assuming M m and M / m is an
integer. After plugging in Equations 17.3 and 17.6 and keeping only the variables M ,
R , and m in the cost model, we get the detailed model
R
r
might be unnecessary. Thus,
r
r
M
m
MR
m
m
R
MM
R
log
TMmR
(,,)
=+ +
ββ β
+
β
+
β
1
0
1
2
3
4
(17.8)
+
+++
Rg M
R
+
β
MR
/
+
β
M
ββ
7
,
5
6
where β i are the positive constants specific to each application. Note that T 1 ( M , m ,
R ) is not linear to its variables, but it is linear to the transformed components: M / m ,
MR / m , m / R , M log M / R , M / R , M , R , and g ( M / R ). The parameter β i defines the contri-
bution of each components in the model. β 0 represents some constant cost invariant
to the parameters. β i are the weights of each components derived in the component-
wise cost analysis. Finally, ϵ represents the overall noise. We leave the discussion on
the item g ( M / R ) later.
17.3.4.1 With Combiner
In the map process, the combiner program is used to aggregate the results by the
key. If there are k keys in the map output, the combiner program reduces the map
result to k records. The cost of the combiner is only subject to the output of the map
program. Thus, it can be incorporated into the parameter β 1 . However, the combiner
function reduces the output data of the map process and thus affects the cost of the
reduce phase. With the combiner, the amount of data that a reduce process needs to
pull from the map is changed to
b R = Mk / R .
(17.9)
Since the item M/R is still there, the cost model (Equation 17.8) applies without any
change.
17.3.4.2 Function g ()
The complexity of reduce program has to be estimated with the specific application.
There are some special cases that the g () item can be removed from Equation 17.8.
If g () is linear to the size of the input data, then its contribution can be merged to the
factor β 4 , because g ( M / R ) ~ M / R . For other cases that cannot be merged, a new item
 
Search WWH ::




Custom Search