Graphics Reference
In-Depth Information
it is not mandatory to pre-
calculate the filter weights.
Once the filtering kernel is
available, we can easily use
the compute shader to in-
voke a single thread for each
pixel to be processed on the
input image. We will first
consider a nai've, brute-force
approach, before moving on
to a more elegant solution.
One other consider-
ation must be made for when
implementing filter weights.
When it is near an edge of
the input image, the filter
will sample image locations
that are actually outside of
the image. This must be considered and handled appropriately for a given situation, and
is typically compensated for in one of several ways. The exterior samples can be clamped
to the edge of the image, which will essentially make the edge pixels a little bit sharper
than the interior pixels after the blurring process. Another way to handle this would be
to eliminate samples that fall outside of the image. To eliminate a sample, the algorithm
must not only skip the addition of the weight filter sample, but must also reduce the overall
sample amount. Since we choose the filter weights to add up to 1, we are performing an
implicitly neutral operation on the image in the regular, interior pixel cases. If we eliminate
some samples, the overall filter weight will add up to less than 1, and thus the resulting
value must be renormalized once again by dividing the output value by the new total filter
weights. This adds some complexity to the calculation process, and must be implemented
with care to ensure that the performance of the algorithm is not significantly reduced. The
implementations demonstrated in this chapter simply neglect this effect, since the focus of
the chapter is to use the computer shader in an efficient manner.
Figure 10.5. Visualizations of various filter kernels with increasing values of
sigma, which will produce increasingly blurred output images.
Brute Force Approach
To invoke the compute shader on each thread, we must select a thread group size that is
within the upper limits of the thread count (which is less than 1024 total threads), but that
can be invoked in a dispatch size that can cover the entire input image. For example, if we
choose the thread groups to be of size [32,32,1], we are within the thread count limit, and
our dispatch call can choose the appropriate number of thread groups to request based on
Search WWH ::




Custom Search