Large-Scale File Systems and Map-Reduce - Mining of Massive Datasets

Databases Reference

In-Depth Information

parallel tasks. That responsibility was automated through the introduction of

DryadLINQ [17]. For a discussion of cluster implementation of recursion, see

[1]. Pregel is from [14].

A different approach to recursion was taken in Haloop [5]. There, recursion

is seen as an iteration, with the output of one round being input to the next

round. E ciency is obtained by managing the location of the intermediate data

and the tasks that implement each round.

The communication-cost model for algorithms comes from [2]. [3] discusses

optimal implementations of multiway joins using a map-reduce system.

There are a number of other systems built on a distributed file system and/or

map-reduce, which have not been covered here, but may be worth knowing

about. [6] describes BigTable, a Google implementation of an object store of

very large size. A somewhat different direction was taken at Yahoo! with Pnuts

[7]. The latter supports a limited form of transaction processing, for example.

PIG [15] is an implementation of relational algebra on top of Hadoop. Sim-

ilarly, Hive [12] implements a restricted form of SQL on top of Hadoop.

1. F.N. Afrati, V. Borkar, M. Carey, A. Polyzotis, and J.D. Ullman, “Clus-

ter computing, recursion, and Datalog,” to appear in Proc. Datalog 2.0

Workshop, Elsevier, 2011.

2. F.N. Afrati and J.D. Ullman, “A new computation model for cluster com-

puting,” http://ilpubs.stanford.edu:8090/953 , Stanford Dept. of CS

Technical Report, 2009.

3. F.N. Afrati and J.D. Ullman, “Optimizing joins in a map-reduce environ-

ment,” Proc. Thirteenth Intl. Conf. on Extending Database Technology,

2010.

4. V. Borkar and M. Carey, “Hyrax: demonstrating a new foundation for

data-parallel computation,”

http://asterix.ics.uci.edu/pub/hyraxdemo.pdf

Univ. of California, Irvine, 2010.

5. Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: e cient iter-

ative data processing on large clusters,” Proc. Intl. Conf. on Very Large

Databases, 2010.

6. F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows,

T. Chandra, A. Fikes, and R.E. Gruber, “Bigtable: a distributed storage

system for structured data,” ACM Transactions on Computer Systems

26:2, pp. 1-26, 2008.

7. B.F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohan-

non, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni, “Pnuts: Ya-

hoo!'s hosted data serving platform,” PVLDB 1:2, pp. 1277-1288, 2008.

Search WWH ::

Custom Search

Home