iMapReduce - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

and the static data, are all stored in files. This key difference results in different

implementation mechanisms, including different data transfers and different joining

techniques of static data and state data. Furthermore, iMapReduce supports asyn-

chronous map execution, which further improves performance. Besides, iMapRe-

duce is implemented based on Hadoop MapReduce. The iterative applications in

Hadoop can be easily modified to run on iMapReduce.

3.7 SUMMARY

Hadoop MapReduce exploiting batched processing model has a few limitations on

supporting iterative computations. iMapReduce extends MapReduce framework,

which aims at improving the performance of iterative computations under a large

cluster environment. iMapReduce extracts the common features of iterative algorithms

and provides the built-in support for iterative processing. In particular, iMapReduce

(1) builds an internal loop from reduce to map within a job to avoid the job startup

overhead, (2) allows asynchronous map task execution to avoid the synchronization

overhead, and (3) separates the iterated state data from the static structure data to

avoid the communication overhead. Accordingly, the system performance is greatly

improved through these optimizations.

ACKNOWLEDGMENTS

This work was partially supported by U.S. NSF grants (CCF-1018114, CNS-1217284),

National Natural Science Foundation of China (61300023), and Fundamental

Research Funds for the Central Universities (N120416001, N120816001).

REFERENCES

1. Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search

engine. Computer Networks and ISDN Systems , 30:107-117, 1998.

2. Yingyi Bu, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. Haloop: Efficient

iterative data processing on large clusters. Proceedings of International Conference on

Very Large Database (VLDB'10) , 3:285-296, September 2010.

3. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy,

and Russell Sears. Mapreduce online. In Proceedings of the 7th USENIX Conference on

Networked Systems Design and Implementation (NSDI'10) , pages 21-21, 2010.

4. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large

clusters. Communications of the ACM , 51:107-113, January 2008.

5. Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy

Qiu, and Geoffrey Fox. Twister: A runtime for iterative MapReduce. In Proceedings of

the 1st International Workshop on MapReduce and its Applications (MAPREDUCE'10) ,

pages 810-818, 2010.

6. Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox. MapReduce for data inten-

sive scientific analyses. In Proceedings of the 4th IEEE International Conference on

eScience (eScience'08) , pages 277-284, 2008.

7. Jiewen Huang, Daniel J. Abadi, and Kun Ren. Scalable SPARQL querying of large rdf

graphs. In VLDB'2011: Proceedings of the 37th International Conference on Very Large

Data Bases . VLDB Endowment, 2011.

Search WWH ::

Custom Search

Home