Parallel Computation in Simulating Diffusion and Deformation in Human Brain - Parallel Computing for Bioinformatics and Computational Biology

Biomedical Engineering Reference

In-Depth Information

is desirable that both the preconditioner construction phase and the preconditioned

solution phase possess a high degree of parallelism. In Ref. [27], two classes of

preconditioners suitable for parallel implementation are investigated. The first parallel

preconditioning technique is a class of sparse approximate inverse (SAI) precondi-

tioners. The SAI preconditioner, as its name implies, is an approximation to A − 1 ,

the inverse of a matrix A . Both its construction and application in the iterative solu-

tion, which require nothing but matrix by vector products, allow a high degree of

parallelism and can be implemented in parallel without much difficulty. The SAI pre-

conditioning technique discussed in Ref. [27] is based on the idea of the least-squares

(Frobenius norm) minimization [28], using a priori sparsity patterns [29], where peo-

ple seek to approximate the inverse of a matrix A (usually sparse) by a sparse matrix

P , such that AP

I in some sense, where I is the identity matrix. Another class of

preconditioners that are involved in Ref. [27] is the block-diagonal preconditioning,

which is also suitable for the parallel architecture. Actually, close attention is paid to

the banded-block-diagonal (BBD) preconditioners. This class of preconditioners is

based on the block Jacobi method where a preconditioner can be derived by a parti-

tioning of the variables. The basic idea is to isolate the preconditioning so that it is

local to each processor. In fact, on parallel computers it is natural to let the partitioning

coincide with the division of the variables over the processors.

In Ref. [27] a number of numerical results are presented to compare the perfor-

mance of SAI and BBD preconditioners on the simulation of the anisotropic diffusion

in the human brain. The numerical tests are conducted on a 32-processor (HP PA-RISC

8700 processors running at 750 MHz) subcomplex of an HP superdome supercom-

puter at the University of Kentucky. Each processor has 2 GB local memory. The

running time reported in all cases is less than 100 s. The experimental results show

that the SAI preconditioners based on a priori sparsity pattern provide a more robust

and efficient parallel preconditioning technique than the BBD preconditioners for the

brain diffusion simulation problem. It is the SAI preconditioner whose convergence

performance is not affected by the number of processors employed, although both

the SAI and BBD preconditioners demonstrate a good speedup, which is close to

linear. The SAI preconditioners take more CPU time to construct, but need less mem-

ory space to store, than the BBD preconditioners. The numerical tests also illustrate

that the best performance of the preconditioners can be obtained by choosing opti-

mum values for their corresponding parameters, τ 1 and τ 2 in SAI, and w 1 and w 2 in

BBD, which have direct and distinct influences on the quality and the construction

expense of the preconditioners, the convergence rate of the iterative solutions, and

the total computational efforts. The comparison of scalability between the SAI and

BBD preconditioners is given in Figure 5.5, where there exists superlinear speedups

for the BBD preconditioner. This can be attributed to the caching effects. When the

problem is dispatched onto multiple processors, the subproblems are obviously a

fraction of the original problem size. With a smaller problem size, it is most likely

to get a higher cache hit rate, and the result, even after considering the commu-

nication time, is still better than the time on a single processor with more cache

misses.

≈

Parallel Computing for Bioinformatics and Computational Biology

Search WWH ::

Custom Search

Home