Graphics Reference
In-Depth Information
with a FFT approach, since bilateral filtering in its original form is a nonlinear
operation and the Fourier transform is linear. For optimal performance, FFTs
often also require that the dimensions of the data are a power of 2. Additionally,
there is no direct support for 4D FFTs in the Nvidia CUFFT library. Instead, one
has to apply two batches of 2D FFTs and change the order of the data between
these (since the 2D FFTs are applied along the two first dimensions).
5.4 Previous Work
A substantial body of work has addressed the acceleration of filtering using GPUs.
Two of the first examples are the work by Rost, who used OpenGL for 2D convo-
lution [Rost 96], and Hopf and Ertl, who used a GPU for separable 3D convolu-
tion [Hopf and Ertl 99]. For game programming, filtering can be used for texture
animation [James 01]. A more recent example is a white paper from Nvidia dis-
cussing separable 2D convolution [Podlozhnyuk 07]. See our recent review about
GPUs in medical imaging for a more extensive overview of GPU-based filter-
ing [Eklund et al. 13]. GPU implementations of non-separable filtering in 3D,
and especially 4D, are less common. For example, the NPP (Nvidia performance
primitives) library contains functions for image processing, but for convolution it
only supports 2D data and filters stored as integers. The CUDA SDK contains
two examples of separable 2D convolution, one example of FFT-based filtering in
2D, and a single example of separable 3D convolution.
The main purpose of this chapter is therefore to present optimized solu-
tions for non-separable 2D, 3D, and 4D convolution with the CUDA program-
ming language, using floats and the fast shared memory. Our code has already
been successfully applied to a number of applications [Eklund et al. 10,Forsberg
et al. 11, Eklund et al. 11, Eklund et al. 12]. The implementations presented
here have been made with CUDA 5.0 and are optimized for the Nvidia GTX
680 graphics card. Readers are assumed to be familiar with CUDA program-
ming, and may avail themselves of the many topic available on this topic if not
(e.g. [Sanders and Kandrot 11]). All the code for this chapter is available under
GNU GPL 3 at https://github.com/wanderine/NonSeparableFilteringCUDA.
5.5 Non-separable 2D Convolution
Two-dimensional convolution between a signal s and a filter f can be written for
position [ x, y ]as
f x = N/ 2
f y = N/ 2
( s
f )[ x, y ]=
s [ x
f x ,y
f y ]
·
f [ f x ,f y ] ,
(5.1)
f x = −N/ 2
f y = −N/ 2
where N + 1 is the filter size. The most important aspect for a GPU imple-
mentation is that the convolution can be done independently for each pixel. To
Search WWH ::




Custom Search