A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

Table 10.1 Speedup results for computing the composite trapezoidal integration rule using an

MPI program

P

1

2

4

8

16

T.P/

0.03909

0.02331

0.01599

0.008227

0.004398

S.P /

N/A

1.68

2.44

4.75

8.89

.P /

N/A

0.84

0.61

0.59

0.56

ite trapezoidal integration rule, see Sect. 10.3.2 . The integral to be approximated is

R 1

0

sin .x/dx using n D 10 6 sampling intervals.

Due to the relatively small problem size and the slow interconnect, the speedup

results in Table 10.1 are not very impressive. This example shows that communica-

tion overhead is an obstacle to good parallel efficiency, which typically deteriorates

with increasing P .

In general, considering all the factors that have an impact on speedup, we should

be prepared that a fix-sized problem often has an upper limit on the number of

processors, beyond which speedup will decrease instead of increase.

10.3

Parallel Programming

So far we have discussed how serial scientific computations can be transformed into

their parallel counterparts. The basic steps include parallelism identification, work

division, and inter-processor collaboration. For the resulting computations to run on

the parallel hardware, code implementation must be done accordingly. The hope of

many people for an automatic tool that is able to analyze a serial code, find the par-

allelism, and insert parallelization commands has proved to be too ambitious. This

observation is at least true for scientific codes of reasonable complexity. Therefore,

some form of manual parallel programming is needed. There are currently three

main forms of parallel programming: (1) recode in a specially designed parallel pro-

gramming language, (2) annotate a serial code with compiler directives to provide

the compiler with hints for parallelization tasks, and (3) restructure a serial code and

insert calls to library functions for explicitly enforcing inter-processor collaboration.

The present section will consider the two latter options. More specifically, we

will briefly explain the use of OpenMP [1]andMPI[14, 26], which target shared-

memory and distributed-memory systems, respectively. Both are well-established

standards of application programming interfaces for parallelization and thus provide

code portability. For the newcomer to parallel programming, the most important

thing is not the syntax details, which can be easily found from textbooks or online

resources; rather, it is important to see that parallel programming requires a “mental

picture” of multiple execution streams, which often need to communicate with each

other by data exchange and/or synchronization.

Elements of Scientific Computing

Search WWH ::

Custom Search

Home