A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

are assigned to the different threads cyclically. An appropriate choice of work divi-

sion and scheduling, with a suitable value of chunksize , is important for the parallel

performance.

Example 10.7. Next, let us look at the most important section of an OpenMP imple-

mentation in C/C CC of the 1D diffusion problem (7.82)-(7.85), for which the

explicit numerical method was given as (10.12)-(10.14).

u_prev = (double * )malloc((n+1) * sizeof(double));

u = (double * )malloc((n+1) * sizeof(double));

t = 0.;

#pragma omp parallel private(k)

{

#pragma omp for

for (i=0; i<=n; i++)

/ * enforce initial condition * /

u_prev[i] = I(i * dx);

for (k=1; k<=m; k++) { / * time integration loop * /

#pragma omp for schedule(static,100)

for (i=1; i<n; i++) / * computing inner points * /

u[i]=u_prev[i]+alpha * (u_prev[i-1]-2 * u_prev[i]+u_prev[i+1])

+dt * f(i * dx,t);

#pragma omp single

{

u[n] = u_prev[n]

+2 * alpha * (u_prev[n-1]-u_prev[n]+N1(t) * dx)

+dt * f(1,t); / * right physical boundary condition * /

t+=dt;

u[0] = D0(t);

/ * left physical boundary condition * /

}

#pragma omp for schedule(static,100)

for (i=0; i<=n; i++)

u_prev[i] = u[i]; / * data copy before next time step * /

}

The above code contains considerably more instances of #pragma than in the

previous example. Apart from parallelizing the for -loop for enforcing the initial

condition, two for -loops, one for computing u `C1

i

on the inner points and the other

for copying array u `C1 to array u ` , are also parallelized inside each time step. It

can be seen that almost the entire code section is wrapped inside a large paral-

lel region, indicated by #pragma omp parallel . This means that P threads are

spawned at the entrance of the parallel region and stay alive throughout the region.

This is why the OpenMP directive #pragma omp single is necessary to mark that

only one thread does the work of incrementing the shared t variable and enforcing

the two physical boundary conditions. Otherwise, letting all the threads repeat the

same work will produce erroneous results.

Another possibility is to not use the large parallel region, but use #pragma omp

parallel for in the three locations where #pragma omp for now stand. Such

Elements of Scientific Computing

Search WWH ::

Custom Search

Home