Graphics Reference
In-Depth Information
Like to the Gaussian brute force approach, this implementation performs a number
of operations roughly proportional to the number of samples used in the filter kernel. For
our 7×7 filter example, this means that the primary portion of our algorithm is executed
49 times, with each loop performing a device memory resource read, followed by some
computations on the data; and finally, the result is written to the device memory resource.
Clearly, this is not an optimal solution, since memory accesses can introduce significant
time delays while the algorithm waits for the requested data to be fetched by the GPU
memory system.
Separable Bilateral Filter
In our Gaussian filter implementation, we used the filter's separable nature to reduce the
amount of work required to process a pixel. This allows us to perform the algorithm in
two steps, which not only reduces the number of operations, but also lets us use the group
shared memory to reduce the number of device memory accesses even further. Overall, this
provides a significant performance improvement over the naive implementation. Ideally,
we would like to use a similar technique to reduce the number of calculations and memory
accesses needed for the bilateral filter as well.
However, as we mentioned earlier, the bilateral filter is nonlinear and is generally not
separable. Strictly speaking, this means that we would not be able to perform the same style
of optimizations with the bilateral filter. With this in mind, in many cases, it is still pos-
sible to use a separable implementation, even though it is not mathematically correct. The
resulting output image will not be identical to the basic implementation, but since this filter
performs a blurring operation, it is less noticeable if the results are not precisely the same as
those of the true algorithm. The performance benefits of using a separable filter generally
outweigh the imperfect results, and we will make this tradeoff in the next implementation.
After deciding to use a separable version of the filter, we will set up the algorithm
in much the same way as the separable Gaussian filter, with the exception of the new
per-sample calculations that are required for the bilateral filter. We can also use the group
shared memory to cache the required color values, just as we have seen in the Gaussian
implementation. The remainder of the filter remains the same, "with the exception that we
must execute two passes to perform the algorithm now, instead of a single pass, as before.
The updated filter implementation is shown in Listing 10.5.
// Declare the input and output resources
Texture2D<float4> InputMap : register( t0 );
RWTexture2D<float4> OutputMap : register( u0 );
II Image sizes
#define size_x 640
#define size_y 480
Search WWH ::




Custom Search