Databases Reference
In-Depth Information
parallel tasks. That responsibility was automated through the introduction of
DryadLINQ [17]. For a discussion of cluster implementation of recursion, see
[1]. Pregel is from [14].
A different approach to recursion was taken in Haloop [5]. There, recursion
is seen as an iteration, with the output of one round being input to the next
round. E ciency is obtained by managing the location of the intermediate data
and the tasks that implement each round.
The communication-cost model for algorithms comes from [2]. [3] discusses
optimal implementations of multiway joins using a map-reduce system.
There are a number of other systems built on a distributed file system and/or
map-reduce, which have not been covered here, but may be worth knowing
about. [6] describes BigTable, a Google implementation of an object store of
very large size. A somewhat different direction was taken at Yahoo! with Pnuts
[7]. The latter supports a limited form of transaction processing, for example.
PIG [15] is an implementation of relational algebra on top of Hadoop. Sim-
ilarly, Hive [12] implements a restricted form of SQL on top of Hadoop.
1. F.N. Afrati, V. Borkar, M. Carey, A. Polyzotis, and J.D. Ullman, “Clus-
ter computing, recursion, and Datalog,” to appear in Proc. Datalog 2.0
Workshop, Elsevier, 2011.
2. F.N. Afrati and J.D. Ullman, “A new computation model for cluster com-
puting,” http://ilpubs.stanford.edu:8090/953 , Stanford Dept. of CS
Technical Report, 2009.
3. F.N. Afrati and J.D. Ullman, “Optimizing joins in a map-reduce environ-
ment,” Proc. Thirteenth Intl. Conf. on Extending Database Technology,
2010.
4. V. Borkar and M. Carey, “Hyrax: demonstrating a new foundation for
data-parallel computation,”
http://asterix.ics.uci.edu/pub/hyraxdemo.pdf
Univ. of California, Irvine, 2010.
5. Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: e cient iter-
ative data processing on large clusters,” Proc. Intl. Conf. on Very Large
Databases, 2010.
6. F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows,
T. Chandra, A. Fikes, and R.E. Gruber, “Bigtable: a distributed storage
system for structured data,” ACM Transactions on Computer Systems
26:2, pp. 1-26, 2008.
7. B.F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohan-
non, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni, “Pnuts: Ya-
hoo!'s hosted data serving platform,” PVLDB 1:2, pp. 1277-1288, 2008.
Search WWH ::




Custom Search