Database Reference
In-Depth Information
parameters can be learned from test runs. Based on this cost model, we can solve
a number of decision problems, such as the optimal amount of resources that can
minimize the financial cost with the constraints of financial budget or time deadline.
We have also conducted a set of experiments on both an in-house Hadoop cluster
and on-demand Hadoop clusters in Amazon EC2 to validate the model. The result
shows that this cost model fits well on four tested programs. Note this modeling and
optimization framework also aligns with the goal of energy efficient computing by
reducing the unnecessary possession and use of cloud resources. If we can model
the energy consumption profiles of the resources, we can also precisely optimize the
overall energy consumption with the proposed framework.
Some future studies include (1) understand the model prediction errors to improve
the modeling process, which might include sample selection and model adjustment,
(2) conduct more experiments on different MapReduce programs and different types
of EC2 instances, and (3) extend the study to energy efficient MapReduce computing.
ACKNOWLEDGMENTS
This project is partly supported by the Ohio Board of Regents and Amazon Web
Services.
REFERENCES
1. Shivnath Babu. Towards automatic optimization of mapreduce programs. In Proceedings
of the 1st ACM Symposium on Cloud Computing , pages 137-142, New York, USA,
2010. ACM.
2. Jayant Baliga, Robert W. A. Ayre, Kerry Hinton, and Rodney S. Tucker. Green cloud
computing: Balancing energy in processing, storage and transport. Proceedings of the
IEEE , 99(1): 149-167, January 2011.
3. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search
engine. In International Conference on World Wide Web , 1998.
4. Abhinandan S. Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. Google news
personalization: Scalable online collaborative filtering. In International Conference on
World Wide Web , pages 271-280, New York, USA, 2007. ACM.
5. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large
clusters. In OSDI , pages 137-150, 2004.
6. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical
Learning . Springer-Verlag, 2001.
7. Herodotos Herodotou and Shivnath Babu. Profiling, what-if analysis, and cost-based
optimization of MapReduce programs. PVLDB , 4(11):1111-1122, 2011.
8. Dawei Jiang, Beng Chin Ooi, Lei Shi, and Sai Wu. The performance of MapReduce: An
in-depth study. In Proceedings of Very Large Databases Conference (VLDB) , 2010.
9. Thorsten Joachims, Laura Granka, Bing Pan, and Geri Gay. Accurately interpreting click-
through data as implicit feedback. In Proceedings of ACM SIGIR Conference , 2005.
10. Karthik Kambatla, Abhinav Pathak, and Himabindu Pucha. Towards optimizing hadoop
provisioning in the cloud. In USENIX Workshop on Hot Topics in Cloud Computing
(HotCloud09) , 2009.
11. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. Pegasus: Mining peta-
scale graphs. Knowledge and Information Systems (KAIS) , 2010.
Search WWH ::




Custom Search