Biomedical Engineering Reference
In-Depth Information
data management and algorithms. At the working resolution, given the size of a
reconstructed arterial tree (linear edge
'
10 cm), the resulting simulation box
Dx 3 , clearly beyond the capabilities of most commodity
and high-end computers. Hence, for the LB simulation and all the ancillary stages
of simulation (mesh construction and data analysis), only the active computational
nodes, those residing inside the arterial vessel, should be taken into account,
resulting in huge savings in memory (about three orders of magnitude) and CPU
time. The scheme relies on representing sparse mesh regions as a compact one-
dimensional primary array, complemented by a secondary array that contains the
Cartesian location of each element. In addition, neighboring mesh points are
accessed by constructing a connectivity matrix whose elements are pointers to the
primary storage array. For the LB mesh topology, this matrix requires the storage of
18
10 11
would have a size
N mesh elements, where N mesh is the number of active computational nodes. The
indirect addressing approach demands some extra programming effort and may
result in a minor (and very reduced on modern computing platforms) computational
penalty in simulating non-sparse geometries. This choice provides strategic
advantages in handling sparse and generic systems, allowing us to handle a number
of fluid nodes of the order 10 9 , a size sufficient to study extended arterial systems
with a high degree of ramification. We further mention the possibility of simulating
the dynamical trajectories of active and passive tracers. Different ways to exchange
hydrodynamic information locally between tracers and mesh nodes can be cast
within the indirect addressing framework, without major efficiency penalties [ 13 ].
To exploit the features of modern computing platforms, the MUPHY code has
been highly tuned and parallelized. The code takes advantage of optimizations like
(a) removal of redundant operations; (b) buffering of multiply used operations [ 45 ],
and (c) fusion of the collision and streaming in a single step. This last technique,
already in use in other high-performance LB codes [ 45 ], significantly reduces data
traffic between main memory and processor. With these optimizations in place,
we achieve
30% of the peak performance of a single core of a modern CPU, in
line with other highly tuned LB kernels [ 45 ]. Indeed, the algorithm for the update of
the LB populations has an unfavorable ratio between number of floating point
operations and number of memory accesses, no optimized libraries are available
as for other computational kernels (e.g., matrix operations or FFTs), and it is not
possible to exploit the SIMD-like operations on many modern processors since the
LB method has a scattered data access pattern.
10.7 Conclusions
Studying the cardiovascular system and capturing the essence of blood circulation
cogently requires to cope with the complexity of such biological fluid, as much as
the details of the anatomy under study. From the computational standpoint, taming
such complexity is not a trivial task, as it requires to handle several computational
actors. Choosing the right computational framework, therefore, is a delicate issue
Search WWH ::




Custom Search