Database Reference
In-Depth Information
We are more interested in the relationship among the total time
T
, the input data
size
M
×
b
, the user defined number of reduce processes
R
, and the number of map
and reduce slots,
m
and
r
.
This general representation can be slightly simplified with a number of settings.
As we discussed, it is safe to assume
R
=
r
, as running Reduces in multiple rounds
ΦΦ
. To make it more convenient to manipu-
late the equation, we also remove ⌈⌉ from ⌈
M
/
m
⌉ by assuming
M
≥
m
and
M
/
m
is an
integer. After plugging in Equations 17.3 and 17.6 and keeping only the variables
M
,
R
, and
m
in the cost model, we get the detailed model
R
r
might be unnecessary. Thus,
r
r
M
m
MR
m
m
R
MM
R
log
TMmR
(,,)
=+ +
ββ β
+
β
+
β
1
0
1
2
3
4
(17.8)
+
+++
Rg
M
R
+
β
MR
/
+
β
M
ββ
7
,
5
6
where β
i
are the positive constants specific to each application. Note that
T
1
(
M
,
m
,
R
) is not linear to its variables, but it is linear to the transformed components:
M
/
m
,
MR
/
m
,
m
/
R
,
M
log
M
/
R
,
M
/
R
,
M
,
R
, and
g
(
M
/
R
). The parameter β
i
defines the contri-
bution of each components in the model. β
0
represents some constant cost invariant
to the parameters. β
i
are the weights of each components derived in the component-
wise cost analysis. Finally, ϵ represents the overall noise. We leave the discussion on
the item
g
(
M
/
R
) later.
17.3.4.1 With Combiner
In the map process, the combiner program is used to aggregate the results by the
key. If there are
k
keys in the map output, the combiner program reduces the map
result to
k
records. The cost of the combiner is only subject to the output of the map
program. Thus, it can be incorporated into the parameter β
1
. However, the combiner
function reduces the output data of the map process and thus affects the cost of the
reduce phase. With the combiner, the amount of data that a reduce process needs to
pull from the map is changed to
b
R
= Mk
/
R
.
(17.9)
Since the item M/R is still there, the cost model (Equation 17.8) applies without any
change.
17.3.4.2 Function
g
()
The complexity of reduce program has to be estimated with the specific application.
There are some special cases that the
g
() item can be removed from Equation 17.8.
If
g
() is linear to the size of the input data, then its contribution can be merged to the
factor β
4
, because
g
(
M
/
R
) ~
M
/
R
. For other cases that cannot be merged, a new item
Search WWH ::
Custom Search