MapReduce and the New Software Stack - Mining of Massive Datasets

Database Reference

In-Depth Information

[2] F.N. Afrati, A. Das Sarma, S. Salihoglu, and J.D. Ullman, “Upper and lower bounds on the cost of a MapReduce

computation.” to appear in Proc. Intl. Conf. on Very Large Databases , 2013. Also available as CoRR, abs/

1206.4377.

[3] F.N. Afrati and J.D. Ullman, “Optimizing joins in a MapReduce environment,” Proc. Thirteenth Intl. Conf. on

Extending Database Technology , 2010.

[4] V. Borkar and M. Carey, “Hyrax: demonstrating a new foundation for dataparallel computation,”

http://asterix.ics.uci.edu/pub/hyraxdemo.pdf

Univ. of California, Irvine, 2010.

[5] Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: efficient iterative data processing on large clusters,”

Proc. Intl. Conf. on Very Large Databases , 2010.

[6] F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R.E. Gruber,

“Bigtable: a distributed storage system for structured data,” ACM Transactions on Computer Systems 26 :2, pp.

1-26, 2008.

[7] B.F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver,

and R. Yerneni, “Pnuts: Yahoo!'s hosted data serving platform,” PVLDB 1 :2, pp. 1277-1288, 2008.

[8] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Comm. ACM 51 :1, pp.

107-113, 2008.

[9] D.J. DeWitt, E. Paulson, E. Robinson, J.F. Naughton, J. Royalty, S. Shankar, and A. Krioukov, “Clustera: an in-

tegrated computation and data management system,” PVLDB 1 :1, pp. 28-41, 2008.

[10] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file system,” 19th ACM Symposium on Operating Sys-

tems Principles , 2003.

[11] hadoop.apache.org , Apache Foundation.

[12] hadoop.apache.org/hive , Apache Foundation.

[13] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. “Dryad: distributed data-parallel programs from sequential

building blocks,” Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems ,

pp. 59-72, ACM, 2007.

[14] G. Malewicz, M.N. Austern, A.J.C. Sik, J.C. Denhert, H. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system

for large-scale graph processing,” Proc. ACM SIGMOD Conference , 2010.

[15] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, “Pig latin: a not-so-foreign language for data pro-

cessing,” Proc. ACM SIGMOD Conference , pp. 1099-1110, 2008.

[16] J.D. Ullman and J. Widom, A First Course in Database Systems , Third Edition, Prentice-Hall, Upper Saddle

River, NJ, 2008.

[17] Y. Yu, M. Isard, D. Fetterly, M. Budiu, I. Erlingsson, P.K. Gunda, and J. Currey, “DryadLINQ: a system for

general-purpose distributed dataparallel computing using a high-level language,” OSDI , pp. 1-14, USENIX As-

sociation, 2008.

1 Optionally, users can specify their own hash function or other method for assigning keys to Reduce tasks. However,

whatever algorithm is used, each key is assigned to one and only one Reduce task.

2 Remember that even looking at a product you don't buy causes Amazon to remember that you looked at it.

3 The matrix is sparse, with on the average of 10 to 15 nonzero elements per row, since the matrix represents the links in

the Web, with m ij nonzero if and only if there is a link from page j to page i . Note that there is no way we could store

a dense matrix whose side was 10 10 , since it would have 10 20 elements.

4 Some descriptions of relational algebra do not include these operations, and indeed they were not part of the original

definition of this algebra. However, these operations are so important in SQL, that modern treatments of relational al-

gebra include them.

Mining of Massive Datasets

Search WWH ::

Custom Search

Home