Information Technology Reference
In-Depth Information
Table 10.1 Speedup results for computing the composite trapezoidal integration rule using an
MPI program
P
1
2
4
8
16
T.P/
0.03909
0.02331
0.01599
0.008227
0.004398
S.P /
N/A
1.68
2.44
4.75
8.89
.P /
N/A
0.84
0.61
0.59
0.56
ite trapezoidal integration rule, see Sect. 10.3.2 . The integral to be approximated is
R 1
0
sin .x/dx using n D 10 6 sampling intervals.
Due to the relatively small problem size and the slow interconnect, the speedup
results in Table 10.1 are not very impressive. This example shows that communica-
tion overhead is an obstacle to good parallel efficiency, which typically deteriorates
with increasing P .
In general, considering all the factors that have an impact on speedup, we should
be prepared that a fix-sized problem often has an upper limit on the number of
processors, beyond which speedup will decrease instead of increase.
10.3
Parallel Programming
So far we have discussed how serial scientific computations can be transformed into
their parallel counterparts. The basic steps include parallelism identification, work
division, and inter-processor collaboration. For the resulting computations to run on
the parallel hardware, code implementation must be done accordingly. The hope of
many people for an automatic tool that is able to analyze a serial code, find the par-
allelism, and insert parallelization commands has proved to be too ambitious. This
observation is at least true for scientific codes of reasonable complexity. Therefore,
some form of manual parallel programming is needed. There are currently three
main forms of parallel programming: (1) recode in a specially designed parallel pro-
gramming language, (2) annotate a serial code with compiler directives to provide
the compiler with hints for parallelization tasks, and (3) restructure a serial code and
insert calls to library functions for explicitly enforcing inter-processor collaboration.
The present section will consider the two latter options. More specifically, we
will briefly explain the use of OpenMP [1]andMPI[14, 26], which target shared-
memory and distributed-memory systems, respectively. Both are well-established
standards of application programming interfaces for parallelization and thus provide
code portability. For the newcomer to parallel programming, the most important
thing is not the syntax details, which can be easily found from textbooks or online
resources; rather, it is important to see that parallel programming requires a “mental
picture” of multiple execution streams, which often need to communicate with each
other by data exchange and/or synchronization.
 
Search WWH ::




Custom Search