Database Reference
In-Depth Information
communication graph along with library code to schedule the work across the avail-
able resources. All data is sent directly between vertices, and thus, the job manager
is only responsible for control decisions and is not a bottleneck for any data transfers.
Therefore, much of the simplicity of the Dryad scheduler and fault-tolerance model
come from the assumption that vertices are deterministic.
Dryad has its own high-level language called DryadLINQ [133]. It generalizes
execution environments such as SQL and MapReduce in two ways: (1) adopting an
expressive data model of strongly typed .NET objects and (2) supporting general-
purpose imperative and declarative operations on data sets within a traditional high-
level programming language. DryadLINQ* exploits LINQ (Language INtegrated
Quer y, a set of .NET constructs for programming with data sets) to provide a pow-
erful hybrid of declarative and imperative programming. The system is designed to
provide flexible and efficient distributed computation in any LINQ-enabled program-
ming language including C#, VB, and F#. Objects in DryadLINQ data sets can be of
any .NET type, making it easy to compute with data such as image patches, vectors,
and matrices. In practice, a DryadLINQ program is a sequential program composed
of LINQ expressions that perform arbitrary side-effect-free transformations on data
sets and can be written and debugged using standard .NET development tools. The
DryadLINQ system automatically translates the data-parallel portions of the pro-
gram into a distributed execution plan, which is then passed to the Dryad execution
platform. Figure 2.22 illustrates the flow of execution of a DryadLINQ program
according to the following steps:
1. When a .NET user application runs, it creates a DryadLINQ expression object.
2. The application triggers a data-parallel execution where the expression
object is handed to DryadLINQ.
3. DryadLINQ compiles the LINQ expression into a distributed Dryad execu-
tion plan. In particular, it performs the following tasks:
a. Decomposing the expression into subexpressions where each expres-
sion can to be assigned to run in a separate Dryad vertex
b. Generating the code and static data for the remote Dryad vertices
c. Generating the serialization code for the required data types
4. DryadLINQ invokes a custom Dryad job manager.
5. The job manager creates the job graph and schedules the vertices as
resources become available.
6. Each Dryad vertex executes a vertex-specific program as created in Step 3(b).
7. When the Dryad job completes successfully it writes the data to the output
table(s).
8. The job manager process terminates and returns control back to DryadLINQ,
which creates objects encapsulating the outputs of the execution. These objects
may be used as inputs to subsequent expressions in the user program.
* http://research.microsoft.com/en-us/projects/dryadlinq/.
http://msdn.microsoft.com/en-us/netframework/aa904594.aspx.
http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/.
Search WWH ::




Custom Search