Database Reference
In-Depth Information
should be created and in the cost model. In the linear case, which is common as we
have observed, the cost model can be further simplified to
M
m
MR
m
m
R
TMmR
(,,)
=+ +
ββ β
+
β
2
0
1
2
3
(17.10)
MM
R
log
+
β
+
β β++
MR
/
+
M
β 7 R ,
4
5
6
17.4 LEARNING THE MODEL
With the formulation of the cost function in terms of input variables M , m , and R , we
need to learn the parameters β i . Note that β i should be different from application to
application. We design a learning procedure as follows.
First, for a specific MapReduce program, we randomly choose the variables M , m ,
and R from certain ranges. For example, m and R (i.e., r ) are chosen within 50; M is
chosen so that at least two rounds of map processes are available for testing. Second,
we collect the time cost of the test run of the MapReduce job for each setting of ( M ,
m , R ), which forms the training data set. Third, regression modeling [14] is applied
to learn the model from the training data with the transformed variables
x 1 = M / m , x 2 = MR / m , x 3 = m / R , x 4 = ( M log M )/ R , x 5 = M / R , x 6 = M , x 7 = R . (17.11)
Because β i has practical meaning, that is, the weights of the components in the total
cost, we have β i ≥ 0, i = 0… r , which requires non-negative linear regression [14] to
solve the learning problem. The cross-validation method [6] is then used to validate
the performance of the learned model. We will show more details in experiments.
17.5 OPTIMIZATION OF RESOURCE PROVISIONING
With the cost model we are now ready to find the optimal settings for different deci-
sion problems. We try to find the best resource allocation for three typical situations:
(1) with a certain limited amount of financial budget; (2) with a time constraint; and
(3) the optimal tradeoff curve without any constraint. In the following, we formulate
these problems as optimization problems based on the cost model.
In all the scenarios we consider, we assume the model parameters β i have been
learned with sample runs in small scale settings. For the simplicity of presentation,
we assume the simplified model T 2 (Equation 17.10) is applied. Cost models with
other reduce complexity do not change the optimization algorithm. Since the input
data is fixed for a specific MapReduce job, M is a constant. We also consider all
general MapReduce system configurations have been optimized via other methods
[1,7,8] and fixed for both small- and large-scale settings. With this setup, the time
cost function becomes
Search WWH ::




Custom Search