Information Technology Reference
In-Depth Information
In addition, the Dirichlet boundary condition (7.83), i.e., u .0; t / D D 0 .t / , is realized
as
u `C1
0
D D 0 .t `C1 /;
(10.13)
whereas the Neumann boundary condition (7.84), i.e., @ u =@x.1; t/ D N 1 .t / ,is
realized as
u n1
C N 1 .t ` /x C tf .1; t ` /;
t
x 2
u `C1
n
D u n
u n
C 2
(10.14)
see Sect. 7.4.3 for the details.
Parallelism in formula (10.12) is due to the fact that the computations on any
two inner points u `C1
i
and u `C1
j
are independent of each other. More specifically, to
compute the value of u `C1
i
, we rely on three nodal values from the previous time
step: u i 1
, thus none of the other u `C1 values. The n 1 inner points can
actually be updated simultaneously for the same time level ` C 1 . Work division
is the same as partitioning these inner points. If the index set f1;2;:::;n 1g is
segmented into P contiguous pieces, the work division is equivalent to decomposing
the solution domain into P subdomains, as is evident in Fig. 10.2 .Actually,formany
PDE problems, it is more natural and general to let domain decomposition give rise
to the work and data division, so that each processor is assigned with a subdomain.
After a work division is decided, the local work of each processor at time step
`C1 is to compute u `C1
i
, u i
, u i C1
using (10.12) for a subset of the i indices f1;2;:::;n 1g .
In addition, the processor that is responsible for x 0
has to update u `C1
0
using
(10.13), and similarly the processor responsible for x n needs to compute u `C1
using
(10.14). It should be stressed again that concurrency in formula (10.12) assumes that
the inner points on the same time level are updated. In other words, no processor
should be allowed to proceed to the next time step before all the other proces-
sors have finished the current time step. Otherwise the computational results will
be incorrect. Such a coordination among processors can typically be achieved by
a built-in synchronization operation called barrier , which forces all processors to
wait for the slowest one. On a shared-memory system, the barrier operation is the
only needed inter-processor communication. All nodal values of u ` are accessible
by all processors on a shared-memory architecture, and therefore there is no need to
communicate while each processor is computing its assigned portion of u `C1 .
n
x= 0
x= 1
Fig. 10.2 An example of partitioning the 1D computational domain x 2 .0; 1/ ,The leftmost and
rightmost mesh points are marked for treatment of the physical boundary conditions, whereas the
inner points are divided fairly among the processors
 
Search WWH ::




Custom Search