A Glimpse of Parallel Computing - Elements of Scientific Computing

Information Technology Reference

In-Depth Information

an approach will avoid using #pragma omp single , but will have to repeatedly

spawn new threads and then terminate them, giving rise to more overhead in thread

creation.

OpenMP programming is quite simple, because the standard consists only of

a small number of directives and clauses. Much of the parallelization work, such

as work division and task scheduling, is hidden from the user. On the one side,

this programming philosophy provides great user friendliness. On the other side,

however, the user has very little control over which thread accesses which part of

the shared memory. This limit is normally bad with respect to the performance on

a modern processor architecture, which relies heavily on the use of caches. To fully

utilize a cache, it is important that the data items that are read from memory to

cache should be reused as much as possible. Data locality - either when data items

that are located close by in memory participate in one computing operation, or when

one data item is repeatedly used in consecutive operations - gives rise to good cache

usage, but is hard to enforce in an OpenMP program.

10.3.2

MPI Programming

MPI is a standard specification for message-passing programming on distributed-

memory computers. On shared-memory systems it is also common to have MPI

installations that are implemented using efficient shared-memory intrinsics for pass-

ing messages. Thus MPI is the most portable approach to parallel programming.

There are two parts of MPI: The first part contains more than 120 functions

and constitutes the core of MPI [26], whereas the second part contains advanced

extensions [14]. Here, we only intend to give a brief introduction to MPI pro-

gramming. For more in-depth learning, we refer the reader to the standard MPI

textbooks [15, 23].

A message in the context of MPI is simply an array of data elements of a prede-

fined data type. The logical execution units in an MPI program are called processes ,

which are initiated by the MPI Init call at the start of a program and terminated

by the MPI Finalize call at the end. The number of started MPI processes is usu-

ally the same as the number of available processors, although one processor can

in principle be assigned with several MPI processes. All the started MPI processes

constitute a so-called global MPI communicator named MPI Comm World , which can

be subdivided into smaller communicators if needed. Almost all the MPI functions

require an input argument of type MPI communicator, which together with a process

rank (between 0 and P 1 ) is used to specify a particular MPI process.

Compared with OpenMP, MPI programming is clearly more difficult, not only

because of the large number of available functions, but also because one MPI call

normally requires many arguments. For example, the simplest function in C for

sending a message is

Search WWH ::

Custom Search

Home