Database Reference
In-Depth Information
means the function will preserve a partitioning and partial order on the
keys.
The Unique-Key where each key/value pair that is produced has a unique
key. The key must be unique across all parallel instances. Any produced
data is therefore partitioned and grouped by the key.
The Partitioned-by-Key where key/value pairs are partitioned by key. This
contract has similar implications as the Super-Key contract, specifically
that a partitioning by the keys is given, but there is no order inside the
partitions.
Figure 2.23 illustrate the system architecture of Nephele/PACT where a PACT
program is submitted to the PACT Compiler, which translates the program into a
data flow execution plan, which is then handed to the Nephele system for parallel
execution. The Hadoop distributed filesystem (HDFS) is used for storing both the
input and the output data.
The incoming jobs of Nephele are represented as data flow graphs where vertices
represent subtasks and edges represent communication channels between these sub-
tasks. Each subtask is a sequential program that reads data from its input channels
and writes to its output channels. Prior execution, Nephele generates the parallel
data flow graph by spanning the received DAG. Hence, vertices are multiplied to
the desired degree of parallelism. Connection patterns that are attached to channels
define how the multiplied vertices are rewired after spanning. During execution, the
Nephele system takes care of resource scheduling, task distribution, communica-
tion as well as synchronization issues. Moreover, Nephele's fault-tolerance mecha-
nisms help to mitigate the impact of hardware outages. Nephele also offers the
ability to annotate the input jobs with a rich set of parameters that could influence
the physical execution. For example, it is possible to set the desired degree of data
parallelism for each subtask, assign particular sets of subtasks to particular sets of
compute nodes, or explicitly specify the type of communication channels between
subtasks. Nephele also supports three different types of communication channels:
PACT
program
PACT compiler
Data flow
program
Nephele execution engine
Distributed filesystem (HDFS)
FIGURE 2.23 The Nephele/PACT system architecture. (From A. Alexandrov et al.,
PVLDB , 3(2), 1625-1628, 2010.)
Search WWH ::




Custom Search