Scalable Image Registration and 3D Reconstruction at Microscopic Resolution - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

Nonrigid Stage

Where the primary effort is focused on improving intensity feature matching perfor-

mance, simple efforts can be made to improve the performance of reading images

from disk, grayscale conversion, and intensity feature extraction.

Given the large size of microscope images, some in excess of 10 GB, reading

from disk and decoding can require a considerable amount of time. A parallel

file system may be employed to reduce this time, although this requires distribut-

ing large amounts of data over a network and can complicate later steps since

the data will be distributed among several nodes rather than a single head node.

A portion of the time spent reading and decoding can be hidden, however, by

overlapping reading/decoding with grayscale conversion, and using the head node

to read/decode incrementally and asynchronous communication to defer grayscale

conversion of incremental reads to worker nodes.

With the grayscale base and float images in memory, the next step is to deter-

mine which template regions will serve as candidates for intensity feature match-

ing. The process is simple: the head node divides the base image among the worker

nodes, which compute the variances of the W 1

×

W 1 template sized tiling of their

portions and return the results.

With a set of candidate intensity feature regions identified, what remains is

to rotate them, extract their templates, and perform the correlations between the

templates and their corresponding search areas. The candidate features are evenly

divided among the worker nodes, who rotate them, extract their templates, and per-

form the correlations between template and search, returning the maximum corre-

lation result magnitudes and coordinates. The base image is stored in column-major

format, so to keep communication to a minimum the candidate feature regions are

buffered in order and the remainder of the image is discarded. Asynchronous com-

munication is used to keep the head node busy while send operations post. The

search windows, taken from the float image, are handled in a similar manner.

However, since the search windows for distinct features can overlap significantly,

they are not individually buffered, rather their union is buffered as a whole.

The division of work on a single node implementation of the nonrigid stage

follows a similar strategy as the multiple node implementation except that no effort

is made to overlap reading/decoding performance. In the case where GPU acceler-

ation is used, intensity feature extraction proceeds sequentially, and as candidate

features are identified they are passed to the GPU. This process is described in

further detail in Section 8.4.3.

The discrete Fourier transforms necessary for calculating correlations on CPU

are performed using the FFT library FFTW [46]. The 2D-DFT dimensions are

critical for performance; ideally the size of the padded transform W 1 +

−

1is

a power of two or a small prime number. For the cases when this size rule cannot

be obeyed, FFTW provides a simple mechanism called a plan that specifies an

optimized plan of execution for the transformation. This plan is precomputed and

subsequently reused, resulting in a one-time cost. For example, with a template size

W 1 =

W 2

350 and a search window size W 2 =

700, FFTW takes around 0.7 second

×

to compute the two 1,049

1,049 forward transforms without planning, whereas

with plan the computation takes only 0.32 second with a 6-second one-time penalty

High-Throughput Image Reconstruction and Analysis

Search WWH ::

Custom Search

Home