Information Technology Reference
In-Depth Information
10.3.1
OpenMP Programming
The OpenMP standard is applicable to shared-memory systems. It assumes that a
parallel program has one or several parallel regions, where a parallel region is a
piece of code that can be parallelized, e.g., a for -loop that implements the com-
posite trapezoidal integration rule (10.6). Between the parallel regions the code
is executed sequentially just like a serial program. Within each parallel region,
however, a number of threads are spawned to execute concurrently. The program-
mer's responsibility is to insert OpenMP directives together with suitable clauses, so
that an OpenMP-capable compiler can use these hints to automatically parallelize
the annotated parallel regions. A great advantage is that a non-OpenMP-capable
compiler will ignore the directives and treat the code as purely serial.
The OpenMP directive for constructing a parallel region is #pragma omp
parallel in the C and C CC programming languages. The two most important
OpenMP directives in C/C CC for parallelization are (1) #pragma omp for suitable
for data parallelism associated with a for -loop, and (2) #pragma omp sections
suitable for task parallelism. In the Fortran language, the OpenMP directives and
clauses have slightly different names. For in-depth discussions about OpenMP
programming, we refer the reader to [8]and[9].
Example 10.6. Let us show below an OpenMP parallelization of the composite
trapezoidal rule (10.6), implemented in C/C++. Since the computational work in a
serial implementation is typically contained in a for -loop, the OpenMP paralleliza-
tion should let each thread carry out the work for one segment of the for -loop.
This will result in a local s p value on each thread, as described in (10.7). All
the s p values will then need to be added up by a reduction operation as men-
tioned in Section 10.2.2 . Luckily, a programmer does not have to explicitly specify
how the for -loop is to be divided, which is handled automatically by OpenMP
behind the scene. The needed reduction operation is also incorporated in OpenMP's
#pragma omp for directive:
h = (b-a)/n;
sum = 0.;
#pragma omp parallel for reduction(+:sum)
for (i=1; i<=n-1; i++)
sum += f(a+i * h);
sum += 0.5 * (f(a)+f(b));
sum * =h;
In comparison with a serial implementation, the only difference is the line start-
ing with #pragma omp parallel for , which is the combined OpenMP directive
for parallelizing a single for -loop that constitutes an entire parallel region. The
reduction(+:sum) clause on the same line enforces, at the end of the loop, a reduc-
tion operation of adding all the sum variables over all threads. Additional clauses can
also be added. For example, schedule(static,chunksize) will result in loop
iterations that are divided statically into chunks of size chunksize , and the chunks
Search WWH ::




Custom Search