Biomedical Engineering Reference
In-Depth Information
For example, if
is addition, then applying scan to the set A
=[
9
,
0
,
2
,
1
,
8
,
3
,
3
,
4
]
,
returns the set scan
. Other common binary func-
tions include min, max, logical AND, and logical OR. The scan operator turns out
to be a surprisingly fundamental building block that is useful for a variety of par-
allel algorithms including histograms, run-length encoding, box-filtering (summed
area tables), cross-correlation, distance transform, matrix operations, string com-
parison, polynomial evaluation, solving recurrences, tree operations, radix sort,
quicksort, and lexical analysis. An extremely efficient GPU-based unsegmented
scan algorithm implementation that uses O( n ) space to accomplish O ( N )workis
given by Harris et al. [53] that was further extended to segmented scans in [52].
Lectures and resource materials on GPU programming can be found at [54, 55].
(
A
)=[
0, 9, 9, 11, 12, 20, 23, 26
]
Data Structures. The GPU memory model is grid or array based for textures
rather than linear, and data can be mapped accordingly for maximum efficiency.
One-dimensional arrays are limited by the maximum size of a texture (8,192 ele-
ments for the NVIDIA GeForce G80 processor) and also suffer from slower lookup
times in GPU memory due to the optimization for two-dimensional textures via
caching. Large one-dimensional arrays can be packed into two-dimensional tex-
tures and the shader program can perform the address translation to access the
data correctly. Two-dimensional floating point or integer data arrays map exactly
onto textures with the exception of those that exceed the maximum allowable size
limit. At that point the array can be split amongst multiple textures and correct
indexing is managed by setting multiple rendering targets. Three-dimensional (or
higher) arrays pose a difficulty since currently GPUs can read the data as a texture
but the GPU cannot directly generate x / y / z pixels for the pixel shader to operate
upon. Lefohn et al. [56] suggested three solutions: (1) use a 3D texture but only
operate on a single texture at a time; (2) store each slice as a separate texture in
graphics memory with the CPU activating the appropriate slice; or (3) place all
data slices into a single 2D texture allowing for operation on the entire 3D array
in a single pass.
Framebuffer Objects. A typical rendering pass sends its results to the framebuffer
for display on the screen. Framebuffer Object (FBO) extensions were added to
the OpenGL specification to enable more efficient rendering directly to texture
memory rather than copying information from the framebuffer. The FBO acts
like a pointer to allocated areas of texture memory. A single FBO can be used to
manage multiple textures called attachments, and the number of attachments is
dependent on the GPU. Prior to executing a shader pass, an FBO attachment is
bound, which switches rendering from the screen framebuffer to texture memory.
After a rendering pass is completed, the result can be used as input for another
shader for further processing.
Parallel Reduction. The pixel shader does not have a mechanism for accumu-
lating a result like a sum or product of elements, or other binary associative op-
erations like minimum or maximum value over an array (texture). A solution is
to write the texture to CPU memory and iterate over the array of values, but this
Search WWH ::




Custom Search