iMapReduce - Large Scale and Big Data: Processing and Management - page 122

Database Reference

In-Depth Information

(a)

3000

MapReduce

iMapReduce

2500

2000

1500

1000

500

0

0

2

4

6

8

10

Iterations

(b)

9000

MapReduce

iMapReduce

8000

7000

6000

5000

4000

3000

2000

1000

1

2

3

4

5

Iterations

FIGURE 3.6

The running time of K-means (a) and MPI (b).

3.6 RELATED WORK

MapReduce, as a popular distributed framework for data-intensive computation,

has gained considerable attention over the past few years [4]. The framework has

been extended for diverse application requirements. MapReduce Online [3] pipelines

map/reduce operations and performs online aggregation to support efficient online

queries, which directly inspires our work.

To support implementing large-scale iterative algorithms, there are a number of

studies proposing new distributed computing frameworks for iterative processing

[2,5,8-10,13,14,16,17,20].

A class of these efforts targets on managing static data efficiently. Design pat-

terns for running efficient graph algorithms in MapReduce have been introduced in

[10]. They partition the static graph adjacency list into n parts and pre-store them on

DFS. However, since the MapReduce framework arbitrarily assigns reduce tasks to

workers, accessing the graph adjacency list can involve remote reads. This cannot

guarantee local access to the static data. HaLoop [2] is proposed aiming at iterative

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home