A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

10.3.1

OpenMP Programming

The OpenMP standard is applicable to shared-memory systems. It assumes that a

parallel program has one or several parallel regions, where a parallel region is a

piece of code that can be parallelized, e.g., a for -loop that implements the com-

posite trapezoidal integration rule (10.6). Between the parallel regions the code

is executed sequentially just like a serial program. Within each parallel region,

however, a number of threads are spawned to execute concurrently. The program-

mer's responsibility is to insert OpenMP directives together with suitable clauses, so

that an OpenMP-capable compiler can use these hints to automatically parallelize

the annotated parallel regions. A great advantage is that a non-OpenMP-capable

compiler will ignore the directives and treat the code as purely serial.

The OpenMP directive for constructing a parallel region is #pragma omp

parallel in the C and C CC programming languages. The two most important

OpenMP directives in C/C CC for parallelization are (1) #pragma omp for suitable

for data parallelism associated with a for -loop, and (2) #pragma omp sections

suitable for task parallelism. In the Fortran language, the OpenMP directives and

clauses have slightly different names. For in-depth discussions about OpenMP

programming, we refer the reader to [8]and[9].

Example 10.6. Let us show below an OpenMP parallelization of the composite

trapezoidal rule (10.6), implemented in C/C++. Since the computational work in a

serial implementation is typically contained in a for -loop, the OpenMP paralleliza-

tion should let each thread carry out the work for one segment of the for -loop.

This will result in a local s p value on each thread, as described in (10.7). All

the s p values will then need to be added up by a reduction operation as men-

tioned in Section 10.2.2 . Luckily, a programmer does not have to explicitly specify

how the for -loop is to be divided, which is handled automatically by OpenMP

behind the scene. The needed reduction operation is also incorporated in OpenMP's

#pragma omp for directive:

h = (b-a)/n;

sum = 0.;

#pragma omp parallel for reduction(+:sum)

for (i=1; i<=n-1; i++)

sum += f(a+i * h);

sum += 0.5 * (f(a)+f(b));

sum * =h;

In comparison with a serial implementation, the only difference is the line start-

ing with #pragma omp parallel for , which is the combined OpenMP directive

for parallelizing a single for -loop that constitutes an entire parallel region. The

reduction(+:sum) clause on the same line enforces, at the end of the loop, a reduc-

tion operation of adding all the sum variables over all threads. Additional clauses can

also be added. For example, schedule(static,chunksize) will result in loop

iterations that are divided statically into chunks of size chunksize , and the chunks

Elements of Scientific Computing

Search WWH ::

Custom Search

Home