Non-separable 2D, 3D, and 4D Filtering with CUDA - GPU Pro: Advanced Rendering Techniques

Graphics Reference

In-Depth Information

with a FFT approach, since bilateral filtering in its original form is a nonlinear

operation and the Fourier transform is linear. For optimal performance, FFTs

often also require that the dimensions of the data are a power of 2. Additionally,

there is no direct support for 4D FFTs in the Nvidia CUFFT library. Instead, one

has to apply two batches of 2D FFTs and change the order of the data between

these (since the 2D FFTs are applied along the two first dimensions).

5.4 Previous Work

A substantial body of work has addressed the acceleration of filtering using GPUs.

Two of the first examples are the work by Rost, who used OpenGL for 2D convo-

lution [Rost 96], and Hopf and Ertl, who used a GPU for separable 3D convolu-

tion [Hopf and Ertl 99]. For game programming, filtering can be used for texture

animation [James 01]. A more recent example is a white paper from Nvidia dis-

cussing separable 2D convolution [Podlozhnyuk 07]. See our recent review about

GPUs in medical imaging for a more extensive overview of GPU-based filter-

ing [Eklund et al. 13]. GPU implementations of non-separable filtering in 3D,

and especially 4D, are less common. For example, the NPP (Nvidia performance

primitives) library contains functions for image processing, but for convolution it

only supports 2D data and filters stored as integers. The CUDA SDK contains

two examples of separable 2D convolution, one example of FFT-based filtering in

2D, and a single example of separable 3D convolution.

The main purpose of this chapter is therefore to present optimized solu-

tions for non-separable 2D, 3D, and 4D convolution with the CUDA program-

ming language, using floats and the fast shared memory. Our code has already

been successfully applied to a number of applications [Eklund et al. 10,Forsberg

et al. 11, Eklund et al. 11, Eklund et al. 12]. The implementations presented

here have been made with CUDA 5.0 and are optimized for the Nvidia GTX

680 graphics card. Readers are assumed to be familiar with CUDA program-

ming, and may avail themselves of the many topic available on this topic if not

(e.g. [Sanders and Kandrot 11]). All the code for this chapter is available under

GNU GPL 3 at https://github.com/wanderine/NonSeparableFilteringCUDA.

5.5 Non-separable 2D Convolution

Two-dimensional convolution between a signal s and a filter f can be written for

position [ x, y ]as

f x = N/ 2

f y = N/ 2

( s

∗

f )[ x, y ]=

s [ x

−

f x ,y

−

f y ]

·

f [ f x ,f y ] ,

(5.1)

f x = −N/ 2

f y = −N/ 2

where N + 1 is the filter size. The most important aspect for a GPU imple-

mentation is that the convolution can be done independently for each pixel. To

GPU Pro: Advanced Rendering Techniques

Search WWH ::

Custom Search

Home