Toward Optimal Resource Provisioning for Economical and Green MapReduce Computing in the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

parameters can be learned from test runs. Based on this cost model, we can solve

a number of decision problems, such as the optimal amount of resources that can

minimize the financial cost with the constraints of financial budget or time deadline.

We have also conducted a set of experiments on both an in-house Hadoop cluster

and on-demand Hadoop clusters in Amazon EC2 to validate the model. The result

shows that this cost model fits well on four tested programs. Note this modeling and

optimization framework also aligns with the goal of energy efficient computing by

reducing the unnecessary possession and use of cloud resources. If we can model

the energy consumption profiles of the resources, we can also precisely optimize the

overall energy consumption with the proposed framework.

Some future studies include (1) understand the model prediction errors to improve

the modeling process, which might include sample selection and model adjustment,

(2) conduct more experiments on different MapReduce programs and different types

of EC2 instances, and (3) extend the study to energy efficient MapReduce computing.

ACKNOWLEDGMENTS

This project is partly supported by the Ohio Board of Regents and Amazon Web

Services.

REFERENCES

1. Shivnath Babu. Towards automatic optimization of mapreduce programs. In Proceedings

of the 1st ACM Symposium on Cloud Computing , pages 137-142, New York, USA,

2010. ACM.

2. Jayant Baliga, Robert W. A. Ayre, Kerry Hinton, and Rodney S. Tucker. Green cloud

computing: Balancing energy in processing, storage and transport. Proceedings of the

IEEE , 99(1): 149-167, January 2011.

3. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search

engine. In International Conference on World Wide Web , 1998.

4. Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news

personalization: Scalable online collaborative filtering. In International Conference on

World Wide Web , pages 271-280, New York, USA, 2007. ACM.

5. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large

clusters. In OSDI , pages 137-150, 2004.

6. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical

Learning . Springer-Verlag, 2001.

7. Herodotos Herodotou and Shivnath Babu. Profiling, what-if analysis, and cost-based

optimization of MapReduce programs. PVLDB , 4(11):1111-1122, 2011.

8. Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. The performance of MapReduce: An

in-depth study. In Proceedings of Very Large Databases Conference (VLDB) , 2010.

9. Thorsten Joachims, Laura Granka, Bing Pan, and Geri Gay. Accurately interpreting click-

through data as implicit feedback. In Proceedings of ACM SIGIR Conference , 2005.

10. Karthik Kambatla, Abhinav Pathak, and Himabindu Pucha. Towards optimizing hadoop

provisioning in the cloud. In USENIX Workshop on Hot Topics in Cloud Computing

(HotCloud09) , 2009.

11. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. Pegasus: Mining peta-

scale graphs. Knowledge and Information Systems (KAIS) , 2010.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home