PARALLEL CO-VOLUME SUBJECTIVE SURFACE METHOD FOR 3D MEDICAL IMAGE SEGMENTATION - Deformable Models

Biomedical Engineering Reference

In-Depth Information

4.2. Parallelization of the Co-Volume Algorithm Using MPI

There are two main goals for parallelization of a program: to handle huge

amounts of data that cannot be placed into the memory of one single serial com-

puter, and to run the program faster. Let us suppose that in terms of running time,

a fraction p of a program can be parallelized. In an ideal situation, executing the

parallelized program on n processors, the running time will be 1 −

p

n .We

can see that, if, e.g., only 80% of the program can be parallelized, i.e., p =0 . 8,

the maximal speed-up (estimated from above by 1

p +

1 −p ) cannot exceed 5, although

with infinitely many processors. This illustrative example shows that it is very

important to identify the fraction of the program that can be parallelized and maxi-

mize it. Fortunately, every time-consuming part of our algorithm can be efficiently

parallelized either directly (reading and writing data, computing coefficients of the

linear system), or it is possible to change the serial linear solver (SOR method) to

the parallel solver (e.g., RED-BLACK SOR method), which can be parallelized

efficiently. The next important issue in parallelization is to balance the workload

of the parallel processes. This problem can be addressed by as uniform as possible

splitting of the data so that every process provides approximately the same number

of operations. The final and very important step is to minimize the time spent for

communication. This leads, e.g., to a requirement that the data transmitted (e.g.,

multidimensional arrays) be contiguous in memory, so that one can exchange it

among processors directly in one message using only one call of MPI send and re-

ceive subroutines. For parallel algorithms implemented in C language this means

that multidimensional arrays should be distributed in row-wise blocks, or, better,

say we have to split the multidimensional array like u [ i ][ j ][ k ] in the first index i

(cf. Figure 7).

In Figure 8 we can see the main structure of our parallel program. First, the

MPI environment is initiated, and every process involved in parallel execution gets

its rank stored in variable myid=0 ,...,n procs − 1, where 0 represents the root

process and n procs is the number of parallel processes. Then by the root process

we read the time step τ and the upper estimate of the number of time steps nts .

These parameters are sent to all processes by the MPI Bcast subroutine. In the

beginning of the algorithm we also read the image, compute the discrete values of

g T in function Coefficients0, and construct the initial segmentation function. All

these procedures work independently on their own (overlapping) subsets of data,

and no exchange of information between processes is necessary in this part of the

program. Then in the cycle we call procedure EllipticStep, which in every time

step contains computing of coefficients (25) and solving the linear system (26). In

the iterative solution of the linear system we will need to exchange overlapping

data between neighbouring processes. The cycle is finished when condition ((23))

is fulfilled, and finally the MPI environment is finalized.

Figure 9 shows our distribution of data among the processes. Both the 3D

image and the discrete values of the segmentation function are represented by a

Deformable Models

Search WWH ::

Custom Search

Home