Database Reference
In-Depth Information
Client machine
(1)
.NET
ToDryadtable
Foreach
LINQ
expr
(2)
.NET
objects
(9)
DryadLINQ
Output
Dryadtable
(3)
Compile
Invoke
Results
(4)
(5)
Vertex
code
Exec
plan
JM
(8)
Input
tables
Dryad
execution
Output
tables
(7)
Data center
(6)
FIGURE 2.22 LINQ-expression execution in DryadLINQ. (From Y. Yu et al., DryadLINQ:
A system for general-purpose distributed data-parallel computing using a high-level lan-
guage, in OSDI , pp. 1-14, 2008.)
9. Control returns to the user application. The iterator interface over a Dryad
table allows the user to read its contents as .NET objects.
10. The application may generate subsequent DryadLINQ expressions that can
be executed by a repetition of Steps 2 to 9.
A commercial implementation of Dryad and DryadLINQ was released in 2011
under the name LINQ to HPC .*
2.6.3 s Park
The Spark system [135,136] has been proposed to support the applications that need
to reuse a working set of data across multiple parallel operations (e.g., iterative
machine learning algorithms and interactive data analytic) while retaining the scal-
ability and fault tolerance of MapReduce. To achieve these goals, Spark introduces
an abstraction called resilient distributed data sets (RDDs). An RDD is a read-only
collection of objects partitioned across a set of machines that can be rebuilt if a
partition is lost. Therefore, users can explicitly cache an RDD in memory across
machines and reuse it in multiple MapReduce-like parallel operations. RDDs do not
need to be materialized at all times. RDDs achieve fault tolerance through a notion
of lineage . In particular, each RDD object contains a pointer to its parent and infor-
mation about how the parent was transformed. Hence, if a partition of an RDD is
lost, the RDD has sufficient information about how it was derived from other RDDs
to be able to rebuild just that partition.
* http://msdn.microsoft.com/en-us/library/hh378101.aspx.
Search WWH ::




Custom Search