Information Technology Reference
In-Depth Information
such collective communications with associated calculation (i.e., summation in this
example) that are called reduction operations . Users thus do not have to program
these from scratch. The time cost of these built-in reduction operations is typically
on the order
P/ .
As an alternative to the above data-parallel approach to the example of the com-
posite trapezoidal integration rule, let us now consider another approach that is
more in the style of task parallelism. To this purpose, let us first note the following
relation:
O . log
2
Z b
Z X j C1
f.x/dx D P X
j D0
f.x/dx;
(10.11)
a
X j
where a D X 0 <X 1 <X 2 < ::: < X P 1 <X P D b is an increasing sequence of
x -coordinates that coincide with a subset of the sampling points f x i g .Inother
words, formula ( 10.11 ) divides the integral domain Œa; b into P segments. The
following new parallelization approach is motivated by the fact that a serial imple-
mentation of the composite trapezoidal rule probably already exists, such as Algo-
rithm 6.2. The idea now is to let processor p call the serial trapezoidal .X p ;X pC1 ;
f; n p / function as an independent task to approximate R X pC1
X p f.x/dx . Afterward,
the global sum is obtained by a reduction operation involving all the P processors.
We have to ensure that n p this time gives a division of the n sampling intervals,
instead of dividing the n 1 inner points. (For computing the X p values that match
the P -way division of the n intervals, we refer the reader to Exercise 10.4 .) The
advantage is the reuse of an existing serial code, plus a more compact parallel imple-
mentation. The actual computation on each processor is done by a single call to
the trapezoidal .X p ;X pC1 ;f;n p / function instead of a for -loop. The disadvantage
is a small number of duplicated function evaluations and floating-point arithmetic
operations, which arise because computation of 0:5f .X p / is carried out on both
processor p 1 and processor p . In practical cases, where n is very large, the cost
of the duplicated operations can be neglected.
10.2.5
Example 3 of Data Parallelism
The above example of computing the composite trapezoidal rule in parallel only
requires one collective communication in the form of a reduction operation at the
end of the computation. Let us now look at another example where inter-processor
collaboration is more frequent.
Consider an explicit numerical scheme for solving the 1D diffusion problem
(7.82)-(7.85) from Sect. 7.4. The finite difference formula for the n 1 inner points
is, as already given in (7.91),
u i 1 2 u i
C tf .x i ;t ` /
t
x 2
u `C1
i
D u i
C u i C1
C
for i D 1;2;:::;n 1:
(10.12)
Search WWH ::




Custom Search