Distributed Programming for the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

Client machine

(1)

.NET

ToDryadtable

Foreach

LINQ

expr

(2)

.NET

objects

(9)

DryadLINQ

Output

Dryadtable

(3)

Compile

Invoke

Results

(4)

(5)

Vertex

code

Exec

plan

JM

(8)

Input

tables

Dryad

execution

Output

tables

(7)

Data center

(6)

FIGURE 2.22 LINQ-expression execution in DryadLINQ. (From Y. Yu et al., DryadLINQ:

A system for general-purpose distributed data-parallel computing using a high-level lan-

guage, in OSDI , pp. 1-14, 2008.)

9. Control returns to the user application. The iterator interface over a Dryad

table allows the user to read its contents as .NET objects.

10. The application may generate subsequent DryadLINQ expressions that can

be executed by a repetition of Steps 2 to 9.

A commercial implementation of Dryad and DryadLINQ was released in 2011

under the name LINQ to HPC .*

2.6.3 s Park

The Spark system [135,136] has been proposed to support the applications that need

to reuse a working set of data across multiple parallel operations (e.g., iterative

machine learning algorithms and interactive data analytic) while retaining the scal-

ability and fault tolerance of MapReduce. To achieve these goals, Spark introduces

an abstraction called resilient distributed data sets (RDDs). An RDD is a read-only

collection of objects partitioned across a set of machines that can be rebuilt if a

partition is lost. Therefore, users can explicitly cache an RDD in memory across

machines and reuse it in multiple MapReduce-like parallel operations. RDDs do not

need to be materialized at all times. RDDs achieve fault tolerance through a notion

of lineage . In particular, each RDD object contains a pointer to its parent and infor-

mation about how the parent was transformed. Hence, if a partition of an RDD is

lost, the RDD has sufficient information about how it was derived from other RDDs

to be able to rebuild just that partition.

* http://msdn.microsoft.com/en-us/library/hh378101.aspx.

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home