Database Reference
In-Depth Information
and the static data, are all stored in files. This key difference results in different
implementation mechanisms, including different data transfers and different joining
techniques of static data and state data. Furthermore, iMapReduce supports asyn-
chronous map execution, which further improves performance. Besides, iMapRe-
duce is implemented based on Hadoop MapReduce. The iterative applications in
Hadoop can be easily modified to run on iMapReduce.
3.7 SUMMARY
Hadoop MapReduce exploiting batched processing model has a few limitations on
supporting iterative computations. iMapReduce extends MapReduce framework,
which aims at improving the performance of iterative computations under a large
cluster environment. iMapReduce extracts the common features of iterative algorithms
and provides the built-in support for iterative processing. In particular, iMapReduce
(1) builds an internal loop from reduce to map within a job to avoid the job startup
overhead, (2) allows asynchronous map task execution to avoid the synchronization
overhead, and (3) separates the iterated state data from the static structure data to
avoid the communication overhead. Accordingly, the system performance is greatly
improved through these optimizations.
ACKNOWLEDGMENTS
This work was partially supported by U.S. NSF grants (CCF-1018114, CNS-1217284),
National Natural Science Foundation of China (61300023), and Fundamental
Research Funds for the Central Universities (N120416001, N120816001).
REFERENCES
1. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search
engine. Computer Networks and ISDN Systems , 30:107-117, 1998.
2. Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. Haloop: Efficient
iterative data processing on large clusters. Proceedings of International Conference on
Very Large Database (VLDB'10) , 3:285-296, September 2010.
3. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy,
and Russell Sears. Mapreduce online. In Proceedings of the 7th USENIX Conference on
Networked Systems Design and Implementation (NSDI'10) , pages 21-21, 2010.
4. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large
clusters. Communications of the ACM , 51:107-113, January 2008.
5. Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy
Qiu, and Geoffrey Fox. Twister: A runtime for iterative MapReduce. In Proceedings of
the 1st International Workshop on MapReduce and its Applications (MAPREDUCE'10) ,
pages 810-818, 2010.
6. Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. MapReduce for data inten-
sive scientific analyses. In Proceedings of the 4th IEEE International Conference on
eScience (eScience'08) , pages 277-284, 2008.
7. Jiewen Huang, Daniel J. Abadi, and Kun Ren. Scalable SPARQL querying of large rdf
graphs. In VLDB'2011: Proceedings of the 37th International Conference on Very Large
Data Bases . VLDB Endowment, 2011.
Search WWH ::




Custom Search