Information Technology Reference
In-Depth Information
such collective communications with associated calculation (i.e., summation in this
example) that are called
reduction operations
. Users thus do not have to program
these from scratch. The time cost of these built-in reduction operations is typically
on the order
P/
.
As an alternative to the above data-parallel approach to the example of the com-
posite trapezoidal integration rule, let us now consider another approach that is
more in the style of task parallelism. To this purpose, let us first note the following
relation:
O
.
log
2
Z
b
Z
X
j C1
f.x/dx D
P
X
j D0
f.x/dx;
(10.11)
a
X
j
where
a D X
0
<X
1
<X
2
< ::: < X
P 1
<X
P
D b
is an increasing sequence of
x
-coordinates that coincide with a subset of the sampling points
f
x
i
g
.Inother
words, formula (
10.11
) divides the integral domain
Œa; b
into
P
segments. The
following new parallelization approach is motivated by the fact that a serial imple-
mentation of the composite trapezoidal rule probably already exists, such as Algo-
rithm 6.2. The idea now is to let processor
p
call the serial trapezoidal
.X
p
;X
pC1
;
f; n
p
/
function as an independent task to approximate
R
X
pC1
X
p
f.x/dx
. Afterward,
the global sum is obtained by a reduction operation involving all the
P
processors.
We have to ensure that
n
p
this time gives a division of the
n
sampling intervals,
instead of dividing the
n 1
inner points. (For computing the
X
p
values that match
the
P
-way division of the
n
intervals, we refer the reader to Exercise
10.4
.) The
advantage is the reuse of an existing serial code, plus a more compact parallel imple-
mentation. The actual computation on each processor is done by a single call to
the trapezoidal
.X
p
;X
pC1
;f;n
p
/
function instead of a
for
-loop. The disadvantage
is a small number of duplicated function evaluations and floating-point arithmetic
operations, which arise because computation of
0:5f .X
p
/
is carried out on both
processor
p 1
and processor
p
. In practical cases, where
n
is very large, the cost
of the duplicated operations can be neglected.
10.2.5
Example 3 of Data Parallelism
The above example of computing the composite trapezoidal rule in parallel only
requires one collective communication in the form of a reduction operation at the
end of the computation. Let us now look at another example where inter-processor
collaboration is more frequent.
Consider an explicit numerical scheme for solving the 1D diffusion problem
(7.82)-(7.85) from Sect. 7.4. The finite difference formula for the
n 1
inner points
is, as already given in (7.91),
u
i 1
2
u
i
C tf .x
i
;t
`
/
t
x
2
u
`C1
i
D
u
i
C
u
i C1
C
for
i D 1;2;:::;n 1:
(10.12)