Graphics Reference
In-Depth Information
800
Shared
Shared unrolled
700
FFT
600
500
400
300
200
100
0
2
4
6
8
10
12
14
16
18
Filter Size
Figure 5.10. Performance, measured in megavoxels per second, for the different imple-
mentations of 4D filtering, for a dataset of size 128 × 128 × 128 × 32 and filter sizes
ranging from 3 × 3 × 3 × 3to17 × 17 × 17 × 17.
can for example handle larger datasets. In our work on 4D image denoising
[Eklund et al. 11], the FFT-based approach was on average only three times faster
(compared to about 30 times faster in the benchmarks given here). The main
reason for this was the high-resolution nature of the data (512 × 512 × 445 × 20
elements), making it impossible to load all the data into global memory. Due to
its higher memory consumption, the FFT-based approach was forced to load a
smaller number of slices into global memory compared to the spatial approach.
As only a subset of the slices (and time points) is valid after the filtering, the
FFT-based approach required a larger number of runs to process all the slices.
Finally, we close by noting two additional topics that readers may wish to
consider for more advanced study. First, applications in which several filters
are applied simultaneously to the same data (e.g, six complex valued quadrature
filters to estimate a local structure tensor in 3D) can lead to different conclu-
sions regarding performance using spatial convolution versus FFT-based filter-
ing. Second, filter networks can be used to speed up spatial convolution by
combining the result of many small filter kernels, resulting in a proportionally
higher gain for 3D and 4D than for 2D convolution [Andersson et al. 99,Svens-
son et al. 05]. All the code for this chapter is available under GNU GPL 3 at
https://github.com/wanderine/NonSeparableFilteringCUDA.
Search WWH ::




Custom Search