Image Processing - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

it is not mandatory to pre-

calculate the filter weights.

Once the filtering kernel is

available, we can easily use

the compute shader to in-

voke a single thread for each

pixel to be processed on the

input image. We will first

consider a nai've, brute-force

approach, before moving on

to a more elegant solution.

One other consider-

ation must be made for when

implementing filter weights.

When it is near an edge of

the input image, the filter

will sample image locations

that are actually outside of

the image. This must be considered and handled appropriately for a given situation, and

is typically compensated for in one of several ways. The exterior samples can be clamped

to the edge of the image, which will essentially make the edge pixels a little bit sharper

than the interior pixels after the blurring process. Another way to handle this would be

to eliminate samples that fall outside of the image. To eliminate a sample, the algorithm

must not only skip the addition of the weight filter sample, but must also reduce the overall

sample amount. Since we choose the filter weights to add up to 1, we are performing an

implicitly neutral operation on the image in the regular, interior pixel cases. If we eliminate

some samples, the overall filter weight will add up to less than 1, and thus the resulting

value must be renormalized once again by dividing the output value by the new total filter

weights. This adds some complexity to the calculation process, and must be implemented

with care to ensure that the performance of the algorithm is not significantly reduced. The

implementations demonstrated in this chapter simply neglect this effect, since the focus of

the chapter is to use the computer shader in an efficient manner.

Figure 10.5. Visualizations of various filter kernels with increasing values of

sigma, which will produce increasingly blurred output images.

Brute Force Approach

To invoke the compute shader on each thread, we must select a thread group size that is

within the upper limits of the thread count (which is less than 1024 total threads), but that

can be invoked in a dispatch size that can cover the entire input image. For example, if we

choose the thread groups to be of size [32,32,1], we are within the thread count limit, and

our dispatch call can choose the appropriate number of thread groups to request based on

Search WWH ::

Custom Search

Home