Parallel Processing Strategies for Cell Motility and Shape Analysis - High-Throughput Image Reconstruction and Analysis

Biomedical Engineering Reference

In-Depth Information

Figure 3.9 Magnified results on sample cells from BAEC dataset. From left to right: Chan and Vese,

GAC, GAC with adaptive force, proposed method, and manual ground truth.

is an expensive operation. Using parallel reduction this kind of operation can be

performed efficiently on the GPU [57]. The original texture is passed to a shader

program parameter and the target rendering texture area is half its size. Every

neighborhood of four texels is considered and evaluated for operation of interest.

As shown in Figure 3.10 the maximum value of every four texels is written to the

target texture. The result is passed back to the shader and the target render area

reduced by half again. This operation is performed log N times where N is the

width of the original texture and the result is the single maximum value among

the texels. A tree-based parallel reduction using the ping-pong method of texture

memory access is bandwidth bound. In CUDA, shared memory is used to reduce

the bandwidth requirement and with sequential (instead of interleaved) addressing

is conflict free. Parallel reduction of N elements requires log( N ) steps that perform

the same number of O( N ) operations as a sequential algorithm. With P process-

ing cores ( P physically parallel threads) the total time complexity is reduced to

O( N / P

+

log N ) compared to O( N ) for sequential reduction.

Figure 3.10 An example of using reduction to find the maximum value n the texture. The lightest

shaded texel from each set of four neighboring texels is kept until the single white texel is isolated.

Search WWH ::

Custom Search

Home