Information Technology Reference
In-Depth Information
3.6 Features of gUSE to Support Data
fl
ow Patterns
Based on the primitive workflow patterns introduced in Sect. 3.1 we can de
ne
more complex workflow patterns in gUSE.
3.6.1 Generator Property
Considering a higher level of parallelism, ports can represent sets of
files instead of
single
files. In the case of output ports this means that the job will generate multiple
files with the given internal
file name pre
x extended by a unique id, and an index
started by 0 as their post
x. For instance, if the internal
file name of a generator port
is set to
files generated as
output.txt_0, output.txt_1, etc. (Fig. 3.1 b). We call a node containing at least one
generator output port a generator node.
output.txt
, then the interpreter will require the set of
3.6.2 Collector Property
Input ports can be set to collect all the items of the output set
fitting the proper
le
name pre
x typed, and start one job instance only (if sets of other ports do not
interfere with it). This behaviour is called the collector property and it implements
Dataflow pattern shown in Fig. 3.1 c. We call a node containing at least one col-
lector input port a collector node. Notice that a collector input port should always be
connected to the output port of a PS node, and its meaning is to collect the
N individual output
files produced by the N instances of the PS node.
3.6.3 Generating Input Datasets
Since multiple ports can be associated to a job, and as each of the ports can
represent a set of
ne relations among these sets. Then
the interpreter can count the proper number of job instances to be executed on each
item of the generated parameter
files, we therefore need to de
field. An obvious strategy is to create the Cartesian
product of the
file sets resulting in ordered pairs of each
file selected in different
sets. In general, X(P 1 , ,P n ) = {(p 1 , ,p n ): p i :P i } .
However, creating Cartesian products, or in gUSE terminology, cross products,
covers the whole parameter
field. gUSE system is able to generate
fields following a
different strategy called dot products denoted by
. It means the pairing of inputs
according to the common index of enumerated members of constituent input
datasets. If the size of one constituent dataset is less than the size of the largest set
.
Search WWH ::




Custom Search