Image Processing - Practical Rendering and Computation with Direct3D 11

Graphics Reference

In-Depth Information

Like to the Gaussian brute force approach, this implementation performs a number

of operations roughly proportional to the number of samples used in the filter kernel. For

our 7×7 filter example, this means that the primary portion of our algorithm is executed

49 times, with each loop performing a device memory resource read, followed by some

computations on the data; and finally, the result is written to the device memory resource.

Clearly, this is not an optimal solution, since memory accesses can introduce significant

time delays while the algorithm waits for the requested data to be fetched by the GPU

memory system.

Separable Bilateral Filter

In our Gaussian filter implementation, we used the filter's separable nature to reduce the

amount of work required to process a pixel. This allows us to perform the algorithm in

two steps, which not only reduces the number of operations, but also lets us use the group

shared memory to reduce the number of device memory accesses even further. Overall, this

provides a significant performance improvement over the naive implementation. Ideally,

we would like to use a similar technique to reduce the number of calculations and memory

accesses needed for the bilateral filter as well.

However, as we mentioned earlier, the bilateral filter is nonlinear and is generally not

separable. Strictly speaking, this means that we would not be able to perform the same style

of optimizations with the bilateral filter. With this in mind, in many cases, it is still pos-

sible to use a separable implementation, even though it is not mathematically correct. The

resulting output image will not be identical to the basic implementation, but since this filter

performs a blurring operation, it is less noticeable if the results are not precisely the same as

those of the true algorithm. The performance benefits of using a separable filter generally

outweigh the imperfect results, and we will make this tradeoff in the next implementation.

After deciding to use a separable version of the filter, we will set up the algorithm

in much the same way as the separable Gaussian filter, with the exception of the new

per-sample calculations that are required for the bilateral filter. We can also use the group

shared memory to cache the required color values, just as we have seen in the Gaussian

implementation. The remainder of the filter remains the same, "with the exception that we

must execute two passes to perform the algorithm now, instead of a single pass, as before.

The updated filter implementation is shown in Listing 10.5.

// Declare the input and output resources

Texture2D<float4> InputMap : register( t0 );

RWTexture2D<float4> OutputMap : register( u0 );

II Image sizes

#define size_x 640

#define size_y 480

Search WWH ::

Custom Search

Home