Toward Optimal Resource Provisioning for Economical and Green MapReduce Computing in the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

•

If the constraint is about the deadline τ for finishing the job, the problem of

minimizing the financial cost can be formulated as

minimize

subject to

um RT mR

TmRm

(

+

)

(, )/

γ

3

(17.16)

(, )

≤>

τ

,

0

,

and R >

0

.

3

•

The above optimization problem can also be slightly changed to describe

the problem that the user simply wants to find the most economical solution

for the job without the deadline, that is, the constraint T 3 ( m , R ) ≤ τ is removed.

Note that the T 3 model parameters might be specific for a particular type of VM

instance that determines the parameters u and γ. Therefore, by testing different types

of VM instance, and applying this optimization repeatedly on each instance type, we

can also find which instance type is the best.

These optimization problems do not involve complicated parameters except for

the T 3 function. Once we learn the concrete setting of the T 3 model parameters, these

optimization problems can be nicely solved since they are all in the category of well-

studied optimization problems. There are plenty of papers and topics discussing how

to solve these optimization problems. In particular, the search space of m and R is

quite limited, for many medium-scale MapReduce jobs, they are normally integers

less than 10,000. In this case, a brute-force search over the entire space to find the

optimal result will not cost much time. Therefore, we will skip the details of solving

these problems.

17.6 EXPERIMENTS

As we have shown, as long as the cost model is accurate, the optimization problems

are easy to solve. Therefore, our focus of experiments will be validating the formu-

lated cost model. We first describe the setup of the experiments, including the experi-

mental environment and the data sets. Four programs are presented: WordCount,

TeraSort, PageRank, and Join, which are used in evaluating the cost model. Finally, a

restrict evaluation on both the in-house cluster and Amazon cloud will be conducted

to show the model goodness of fit and the prediction accuracy.

17.6.1 e XPerimental s etuP

The experiments are conducted in our in-house 16-node Hadoop cluster and Amazon

EC2. We describe the setup of the environments and the data sets used for experi-

ments as follows.

17.6.1.1 In-House Hardware and Hadoop Configuration

Each node in the in-house cluster has four quad-core 2.3-MHz AMD Opteron 2376,

16-GB memory, and two 500-GB hard drives, connected to other nodes with a gigabit

switch. Hadoop 1.0.3 is installed in the cluster. One node serves as the master node

and 15 nodes as the slave nodes. The single master node runs the JobTracker and

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home