Database Reference
In-Depth Information
Spark does not currently support a grouped reduce operation as in MapReduce.
The results of reduce operations are only collected at the driver process. In addition,
Spark supports two restricted types of shared variables to support two simple but
common usage patterns:
Broadcast variables : An object that wraps the value and ensures that it is
only copied to each worker once.
Accumulators : These are variables that workers can only add to using an
associative operation and that only the driver can read.
It should be noted that Spark can also be used interactively from a modified ver-
sion of the Scala interpreter that allows the user to define RDDs, functions, variables,
and classes and use them in parallel operations on a cluster.
2.6.4 n ePhle /PaCt
The Nephele/PACT system [8,15] has been presented as a parallel data proces-
sor centered around a programming model of so-called Parallelization Contracts
(PACTs) and the scalable parallel execution engine Nephele . The PACT program-
ming model is a generalization of map/reduce as it is based on a key/value data model
and the concept of Parallelization Contracts (PACTs). A PACT consists of exactly
one second-order function, which is called Input Contract and an optional Output
Contract . An Input Contract takes a first-order function with task-specific user code
and one or more data sets as input parameters. The Input Contract invokes its associ-
ated first-order function with independent subsets of its input data in a data-parallel
fashion. In this context, the two functions of map and reduce are just examples of the
Input Contracts. Other example of Input Contracts include
The Cross contract, which operates on multiple inputs and builds a distrib-
uted Cartesian product over its input sets.
The CoGroup contract partitions each of its multiple inputs along the key.
Independent subsets are built by combining equal keys of all inputs.
The Match contract operates on multiple inputs. It matches key/value pairs
from all input data sets with the same key (equivalent to the inner join
operation).
An Output Contract is an optional component of a PACT and gives guarantees
about the data that is generated by the assigned user function. The set of Output
Contracts include
The Same-Key contract where each key/value pair that is generated by the
function has the same key as the key/value pair(s) from which it was gen-
erated. This means the function will preserve any partitioning and order
property on the keys.
The Super-Key where each key/value pair that is generated by the function
has a superkey of the key/value pair(s) from which it was generated. This
Search WWH ::




Custom Search