A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

On a distributed-memory system, the parallelization is a bit more complex. We

recall that a processor with only a local memory should avoid allocating global data

structures if possible. So processor p should ideally only operate on two local arrays

u p

and u `C1

p

, both of length n p , which contain, respectively, the assigned segments

and u `C1

i

of the u i

values. The data segmentation can use the formulas (10.8)and

(10.9). However, as indicated by (10.12), computing the leftmost and rightmost val-

ues of u `C1

p

requires one u ` value each from the two neighboring subdomains. To

avoid costly if -tests when computing the leftmost and rightmost values of u `C1

,

it is therefore more convenient to extend the two local arrays by one value at both

ends. These two additional points are called ghost points , whose values participate

in the owner subdomain's computations but are provided by the neighboring subdo-

mains through communication. We refer the reader to Fig. 10.3 for an illustration. It

should also be noted that on processor 0 the left ghost point coincides with the left

physical boundary point x D 0 , whereas on processor P 1 the right ghost point

coincides with the right physical boundary point x D 1 .

Compared with a corresponding implementation on a shared-memory system, the

implementation on a distributed-memory system is different in its inter-processor

communications, in addition to using local data arrays. Here, each pair of neigh-

boring subdomains has to explicitly exchange one nodal value per time step. For

example, the value of u `C1

p;1

p

that is computed on processor p needs to be sent to

processor p 1 , which in return sends back the computed value on its right-

most inner point. A similar data exchange takes place between processors p and

p C 1 . The data exchanges need to be carried out before proceeding to the next time

step. The resulting communications involve pairs of processors, commonly known

as one-to-one communications. In this example, these one-to-one communications

implicitly ensure that the computations on the neighboring processors are synchro-

nized. There is therefore no need for a separate barrier operation, which is required

in the shared-memory implementation. The complete numerical scheme suitable for

a distributed-memory computer is given in Algorithm 10.1 .

Remarks

To a beginner, whether or not parallelism exists in a computational problem can

seem mysterious. A rule of the thumb is that there must be a sufficient amount of

computational work and also that (parts of) the computations must not be dependent

on each other.

processor p

Fig. 10.3 An example of a

subdomain that is assigned to

processor p . The assigned

mesh points consist of two

ghost points and a set of inner

points

ghost point

Elements of Scientific Computing

Search WWH ::

Custom Search

Home