Introduction to High-Performance Computing Using MPI - High-Throughput Image Reconstruction and Analysis - page 131

Biomedical Engineering Reference

In-Depth Information

A negative voxel index value indicates a ''finish message'' signaling that the

sender process has finished its work. In this case, there is no companion message

containing the correlation value. So, the master simply decrements the count and

continues to the next iteration of the loop. When the master receives the ''finish

messages'' from all the compute processes, the count becomes zero. Finally, the

while loop of the master ends, and it closes the output file and proceeds to call

MPI Barrier .

MPI Barrier and MPI Allreduce

The compute processes also call MPI Barrier after completing their work.

MPI Barrier is a collective operation that takes a MPI communicator as its

only argument. This is set to the default communicator MPI COMM WORLD .All

the processes calling MPI Barrier wait in the function until all the processes in

the group have made a call to MPI Barrier . This ensures that a process can begin

the post-barrier part of its work only after all the other processes have finished the

pre-barrier part of their work. This functionality is very useful for synchronizing

processes. Using this, different phases of computations can be cleanly separated

from one another to ensure that MPI messages sent from a later phase do not

interfere with messages sent during the earlier phases of computation.

Finally, all the processes call MPI Allreduce to compute the sum of the

absolute values of correlations between every voxel pair. While computing the

correlations, each process maintains a running sum of the absolute value of cor-

relations that it computes (line 69). In the end, these running sum values stored

at all the compute processes, need to be added to compute the final sum. This is

done efficiently using the collective function MPI Allreduce . This function takes

a number of data elements from each process as its input, applies an associative

global reduction operator and distributes the results to all the processes. For ex-

ample, if the input at each of the processes is an array of 10 integers and the global

reduction operation is the sum , then the output will be an array of 10 integers con-

taining, element-wise, the sum of the corresponding integers at all the processes.

The first argument to this function is a pointer to the buffer containing the

input data. The second argument is a pointer to the buffer where the output is to

be stored. To save memory and copying overhead, many applications desire the

output at the same place as the input. To achieve this, MPI IN PLACE can be used

as the first argument. In this case, the second argument is a pointer to the input and

output buffer. The third and the fourth arguments to this function are the number

of elements and the data-type. The fifth argument is the reduction operation to be

performed. The sixth argument is the communicator.

The set of predefined reduction operators in MPI is quite rich. These include:

MPI MAX for maximum operation, MPI MIN for minimum operation, MPI SUM

for sum operation, MPI PROD for product operation, MPI LAND for logical-and

operation, MPI BAND for bit-wise-and operation, MPI LOR for logical-or operation,

MPI BOR for bit-wise-or operation, MPI LXOR for logical xor operation, MPI BXOR

for bit-wise xor operation, MPI MAXLOC for the location of the maximum value,

and MPI MINLOC for the location of the minimum value. A reduction operator

outside this set is rarely needed. However, MPI allows users to create new reduction

Next Page

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home