Biomedical Engineering Reference
In-Depth Information
is desirable that both the preconditioner construction phase and the preconditioned
solution phase possess a high degree of parallelism. In Ref. [27], two classes of
preconditioners suitable for parallel implementation are investigated. The first parallel
preconditioning technique is a class of sparse approximate inverse (SAI) precondi-
tioners. The SAI preconditioner, as its name implies, is an approximation to A 1 ,
the inverse of a matrix A . Both its construction and application in the iterative solu-
tion, which require nothing but matrix by vector products, allow a high degree of
parallelism and can be implemented in parallel without much difficulty. The SAI pre-
conditioning technique discussed in Ref. [27] is based on the idea of the least-squares
(Frobenius norm) minimization [28], using a priori sparsity patterns [29], where peo-
ple seek to approximate the inverse of a matrix A (usually sparse) by a sparse matrix
P , such that AP
I in some sense, where I is the identity matrix. Another class of
preconditioners that are involved in Ref. [27] is the block-diagonal preconditioning,
which is also suitable for the parallel architecture. Actually, close attention is paid to
the banded-block-diagonal (BBD) preconditioners. This class of preconditioners is
based on the block Jacobi method where a preconditioner can be derived by a parti-
tioning of the variables. The basic idea is to isolate the preconditioning so that it is
local to each processor. In fact, on parallel computers it is natural to let the partitioning
coincide with the division of the variables over the processors.
In Ref. [27] a number of numerical results are presented to compare the perfor-
mance of SAI and BBD preconditioners on the simulation of the anisotropic diffusion
in the human brain. The numerical tests are conducted on a 32-processor (HP PA-RISC
8700 processors running at 750 MHz) subcomplex of an HP superdome supercom-
puter at the University of Kentucky. Each processor has 2 GB local memory. The
running time reported in all cases is less than 100 s. The experimental results show
that the SAI preconditioners based on a priori sparsity pattern provide a more robust
and efficient parallel preconditioning technique than the BBD preconditioners for the
brain diffusion simulation problem. It is the SAI preconditioner whose convergence
performance is not affected by the number of processors employed, although both
the SAI and BBD preconditioners demonstrate a good speedup, which is close to
linear. The SAI preconditioners take more CPU time to construct, but need less mem-
ory space to store, than the BBD preconditioners. The numerical tests also illustrate
that the best performance of the preconditioners can be obtained by choosing opti-
mum values for their corresponding parameters, τ 1 and τ 2 in SAI, and w 1 and w 2 in
BBD, which have direct and distinct influences on the quality and the construction
expense of the preconditioners, the convergence rate of the iterative solutions, and
the total computational efforts. The comparison of scalability between the SAI and
BBD preconditioners is given in Figure 5.5, where there exists superlinear speedups
for the BBD preconditioner. This can be attributed to the caching effects. When the
problem is dispatched onto multiple processors, the subproblems are obviously a
fraction of the original problem size. With a smaller problem size, it is most likely
to get a higher cache hit rate, and the result, even after considering the commu-
nication time, is still better than the time on a single processor with more cache
misses.
Search WWH ::




Custom Search