Information Technology Reference
In-Depth Information
On a distributed-memory system, the parallelization is a bit more complex. We
recall that a processor with only a local memory should avoid allocating global data
structures if possible. So processor p should ideally only operate on two local arrays
u p
and u `C1
p
, both of length n p , which contain, respectively, the assigned segments
and u `C1
i
of the u i
values. The data segmentation can use the formulas (10.8)and
(10.9). However, as indicated by (10.12), computing the leftmost and rightmost val-
ues of u `C1
p
requires one u ` value each from the two neighboring subdomains. To
avoid costly if -tests when computing the leftmost and rightmost values of u `C1
,
it is therefore more convenient to extend the two local arrays by one value at both
ends. These two additional points are called ghost points , whose values participate
in the owner subdomain's computations but are provided by the neighboring subdo-
mains through communication. We refer the reader to Fig. 10.3 for an illustration. It
should also be noted that on processor 0 the left ghost point coincides with the left
physical boundary point x D 0 , whereas on processor P 1 the right ghost point
coincides with the right physical boundary point x D 1 .
Compared with a corresponding implementation on a shared-memory system, the
implementation on a distributed-memory system is different in its inter-processor
communications, in addition to using local data arrays. Here, each pair of neigh-
boring subdomains has to explicitly exchange one nodal value per time step. For
example, the value of u `C1
p;1
p
that is computed on processor p needs to be sent to
processor p 1 , which in return sends back the computed value on its right-
most inner point. A similar data exchange takes place between processors p and
p C 1 . The data exchanges need to be carried out before proceeding to the next time
step. The resulting communications involve pairs of processors, commonly known
as one-to-one communications. In this example, these one-to-one communications
implicitly ensure that the computations on the neighboring processors are synchro-
nized. There is therefore no need for a separate barrier operation, which is required
in the shared-memory implementation. The complete numerical scheme suitable for
a distributed-memory computer is given in Algorithm 10.1 .
Remarks
To a beginner, whether or not parallelism exists in a computational problem can
seem mysterious. A rule of the thumb is that there must be a sufficient amount of
computational work and also that (parts of) the computations must not be dependent
on each other.
processor p
Fig. 10.3 An example of a
subdomain that is assigned to
processor p . The assigned
mesh points consist of two
ghost points and a set of inner
points
ghost point
ghost point
 
Search WWH ::




Custom Search