Graphics Reference
In-Depth Information
Figure 10.9. A visualization of overlapping calculations in neighboring pixels.
accesses for increasing the number of GSM accesses, plus a memory barrier. However, the
group shared memory can be used to share more than cached resource data. We can further
modify our current implementation to allow threads to share the results of some calcula-
tions to reduce the overall computational burden on the GPU. This may become beneficial
after optimizing an algorithm to reduce memory accesses—if the computational portion of
the algorithm becomes proportionally large enough, it can be helpful to attempt to mini-
mize even further the number of mathematic operations.
In the case of the Gaussian filter, we can take advantage of the fact that the two
individual 1D processing passes use filter weights that are symmetric. This, coupled with
our row-and-column based threading scheme, allows us to share intermediate calculations
between two different pixels. To aid in explaining this possibility, Figure 10.9 shows the
calculations needed for two pixels that are near each another. The key to this concept is that
any two pixels equidistant from a pixel between them will both perform the same calcula-
tion on that center pixel.
To avoid having duplicate calculations, we can have each thread precalculate the
weighted versions of its own pixel value and store it in the GSM before the memory barrier
is performed. The precalculated values can then be read by whichever thread needs to use
them, which should effectively reduce the number of multiplications needed by a factor of
roughly 2. This requires the use of additional group shared memory, as well as additional
read and writes to this additional memory. Figure 10.10 shows the layout that will be
used to store the additional precalculated values; the modified compute shader program is
shown in Listing 10.3.
Figure 10.10. The group shared memory layout for caching shared values.
Search WWH ::




Custom Search