Information Technology Reference
In-Depth Information
are assigned to the different threads cyclically. An appropriate choice of work divi-
sion and scheduling, with a suitable value of chunksize , is important for the parallel
performance.
Example 10.7. Next, let us look at the most important section of an OpenMP imple-
mentation in C/C CC of the 1D diffusion problem (7.82)-(7.85), for which the
explicit numerical method was given as (10.12)-(10.14).
u_prev = (double * )malloc((n+1) * sizeof(double));
u = (double * )malloc((n+1) * sizeof(double));
t = 0.;
#pragma omp parallel private(k)
{
#pragma omp for
for (i=0; i<=n; i++)
/ * enforce initial condition * /
u_prev[i] = I(i * dx);
for (k=1; k<=m; k++) { / * time integration loop * /
#pragma omp for schedule(static,100)
for (i=1; i<n; i++) / * computing inner points * /
u[i]=u_prev[i]+alpha * (u_prev[i-1]-2 * u_prev[i]+u_prev[i+1])
+dt * f(i * dx,t);
#pragma omp single
{
u[n] = u_prev[n]
+2 * alpha * (u_prev[n-1]-u_prev[n]+N1(t) * dx)
+dt * f(1,t); / * right physical boundary condition * /
t+=dt;
u[0] = D0(t);
/ * left physical boundary condition * /
}
#pragma omp for schedule(static,100)
for (i=0; i<=n; i++)
u_prev[i] = u[i]; / * data copy before next time step * /
}
}
The above code contains considerably more instances of #pragma than in the
previous example. Apart from parallelizing the for -loop for enforcing the initial
condition, two for -loops, one for computing u `C1
i
on the inner points and the other
for copying array u `C1 to array u ` , are also parallelized inside each time step. It
can be seen that almost the entire code section is wrapped inside a large paral-
lel region, indicated by #pragma omp parallel . This means that P threads are
spawned at the entrance of the parallel region and stay alive throughout the region.
This is why the OpenMP directive #pragma omp single is necessary to mark that
only one thread does the work of incrementing the shared t variable and enforcing
the two physical boundary conditions. Otherwise, letting all the threads repeat the
same work will produce erroneous results.
Another possibility is to not use the large parallel region, but use #pragma omp
parallel for in the three locations where #pragma omp for now stand. Such
Search WWH ::




Custom Search