Database Reference
In-Depth Information
If the constraint is about the deadline τ for finishing the job, the problem of
minimizing the financial cost can be formulated as
minimize
subject to
um RT mR
TmRm
(
+
)
(, )/
γ
3
(17.16)
(, )
≤>
τ
,
0
,
and R >
0
.
3
The above optimization problem can also be slightly changed to describe
the problem that the user simply wants to find the most economical solution
for the job without the deadline, that is, the constraint T 3 ( m , R ) ≤ τ is removed.
Note that the T 3 model parameters might be specific for a particular type of VM
instance that determines the parameters u and γ. Therefore, by testing different types
of VM instance, and applying this optimization repeatedly on each instance type, we
can also find which instance type is the best.
These optimization problems do not involve complicated parameters except for
the T 3 function. Once we learn the concrete setting of the T 3 model parameters, these
optimization problems can be nicely solved since they are all in the category of well-
studied optimization problems. There are plenty of papers and topics discussing how
to solve these optimization problems. In particular, the search space of m and R is
quite limited, for many medium-scale MapReduce jobs, they are normally integers
less than 10,000. In this case, a brute-force search over the entire space to find the
optimal result will not cost much time. Therefore, we will skip the details of solving
these problems.
17.6 EXPERIMENTS
As we have shown, as long as the cost model is accurate, the optimization problems
are easy to solve. Therefore, our focus of experiments will be validating the formu-
lated cost model. We first describe the setup of the experiments, including the experi-
mental environment and the data sets. Four programs are presented: WordCount,
TeraSort, PageRank, and Join, which are used in evaluating the cost model. Finally, a
restrict evaluation on both the in-house cluster and Amazon cloud will be conducted
to show the model goodness of fit and the prediction accuracy.
17.6.1 e XPerimental s etuP
The experiments are conducted in our in-house 16-node Hadoop cluster and Amazon
EC2. We describe the setup of the environments and the data sets used for experi-
ments as follows.
17.6.1.1 In-House Hardware and Hadoop Configuration
Each node in the in-house cluster has four quad-core 2.3-MHz AMD Opteron 2376,
16-GB memory, and two 500-GB hard drives, connected to other nodes with a gigabit
switch. Hadoop 1.0.3 is installed in the cluster. One node serves as the master node
and 15 nodes as the slave nodes. The single master node runs the JobTracker and
Search WWH ::




Custom Search