Biomedical Engineering Reference
In-Depth Information
Figure 8.4 Stacking phenomenon. (a) Mammary images contain sparse adipose tissue interspersed
with duct structures. (b) Nonrigid registration of mammary images by corresponding duct centroids
results in severe structural deformation as all components of duct trajectory within the sectioning
planes are eliminated, resulting in vertical columnar structures. (c) This can be corrected by rigidly
registering the sequence, tracking the duct centroid trajectories, smoothing these trajectories, and
nonrigidly registering the duct centroids to the smoothed trajectories.
8.4 High-Performance Implementation
The size of high-resolution microscope image datasets presents a computational
challenge for automatic registration. Scanning slides with a 20X objective lens
produces images with submicron pixel spacing, often ranging in the gigapixel scale
with tens of thousands of pixels in each dimension. At this scale the amount of data
necessary in quantitative phenotyping studies can easily extend into the terabytes.
This motivates the development of a high-performance computing approach to
registration.
This section discusses high-performance implementation of the two-stage reg-
istration algorithm and introduces solutions at both the hardware and software
layers. At the hardware layer two areas are pursued: parallel systems based on
clusters of multisocket multicore CPUs, and GPUs for hardware acceleration. At
the software layer, performance libraries are used for computing the normalized
correlations by fast Fourier transform (FFT) on both CPU and GPU. Performance
results from the implementation varieties discussed here are presented in further
detail in Section 8.6.
8.4.1 Hardware Arrangement
Both single and multiple node implementations are described below. In the multiple
node implementations, a simple head/workers organization is assumed. Communi-
cation on single node implementations is accomplished with Pthreads, and message
passing interface (MPI) is used for internode communication in multiple node im-
plementations. The details of division of work between CPU and GPU at the level
of individual nodes are discussed further in Section 8.4.3.
8.4.2 Workflow
The workflow for the two-stage registration algorithm is summarized in Figure
8.5. With the exception of computing nonrigid transformations, the CPU-bound
 
Search WWH ::




Custom Search