Scalable Image Registration and 3D Reconstruction at Microscopic Resolution - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

Figure 8.16 (a) Scalability and (b) speedup for increasing numbers of nodes running the most

work-intensive image from the mammary set.

than GPU-assisted executions they are also more scalable on a larger number of

nodes due to the communication bindings of paired CPU-GPU executions. This is

confirmed in Figure 8.14, where the most work-intensive mammary image is tested

for an assorted combination of CPUs and GPUs.

For increasing numbers of nodes Figures 8.15 and 8.16 show a progressive

reduction in execution times. For the most work-intensive mammary image, the

speedup on 16 versus 2 nodes with a 1 CPU configuration is 7.4x, where for a 2

CPU/GPU per node configuration the speedup is slightly over 4x. The less effective

internode parallelism of the more aggressive configurations is due in large part to

their more demanding intranode communications.

8.7 Summary

The next generation of automated microscope imaging applications, such as quan-

titative phenotyping, require the analysis of extremely large datasets, making scal-

ability and parallelization of algorithms essential.

This chapter presents a fast, scalable, and simply parallelizable algorithm for

image registration that is capable of correcting the nonrigid distortions of sec-

tioned microscope images. Rigid initialization follows a simply reasoned process

of matching high level features that are quickly and easily extracted through stan-

dard image processing techniques. Nonrigid registration refines the result of rigid

initialization, using the estimates of rigid initialization to match intensity features

using an FFT-implementation of normalized cross-correlation.

A computational framework for the two-stage algorithm is also provided along

with results from sample high-performance implementations. Two hardware-based

solutions are presented for nonrigid feature matching: parallel systems and graphics

processor acceleration. Scalability is demonstrated on both single node systems

where GPUs and CPUs cooperate, and also on multiple node systems where any

variety of the single node configurations can divide the work. From a departure

point of 181 hours to run 500 mammary images on a single Opteron CPU, the

GPU accelerated parallel implementation is able to reduce this time to 3.7 hours

Search WWH ::

Custom Search

Home