Information Technology Reference
In-Depth Information
In our previous work [4], we proposed a number of improvements to the
NLMeans filter, for denoising grayscale still images. Some of these improvements
which are relevant for this paper are:
- An extension of the NLMeans to correlated noise: even though the origi-
nal NLMeans filter relies on a white Gaussian noise assumption, the power
spectral densities of noise in real images and video sequences is rarely flat
(see [23]).
- Acceleration techniques that exploit the symmetry in the weight computa-
tion and that compute the Euclidean distance between patches by a recursive
moving average filter. By these accelerations, the computation time can be
reduced by a factor 121 (for 11 × 11 patches), without sacrifying image quality
at all!
In spite of efforts by many researchers and also our recent improvements, the
NLMeans algorithm is not well suited for real-time denoising of video sequences
on a CPU. Using our improvements, denoising one 512 × 512 color image takes
about 30 sec. for a modestly optimized C++ implementation on a recent 2GHz
CPU (single-threaded implementation). Consequently this technique is not ap-
plicable to e.g. real-time video communication.
Nowadays, there is a trend toward the use of parallel processing architectures
in order to accelerate the processing. One example of such architecture is the
graphical processing unit (GPU). Although the GPU is primarily designed for
the rendering of 3D scenes, advances of the GPU in the late 90's enabled many
researchers and engineers to use the GPU for more general computations. This
led to the so-called GPGPU (General-Purpose computations on GPUs) [24] and
many approaches (e.g. based on OpenGL, DirectX, CUDA, OpenCL, ...) exist
to achieve GPGPU with existing GPU hardware. Also because the processing
power of modern GPUs has tremendously increased in the last decade (even for
inexpensive GPUs a speed-up of a factor 20 ×
can be expected) and is
even more improving, it becomes worthwhile to investigate which video denoising
methods can eciently be implemented on a GPU.
Recently, a number of authors have implemented the NLMeans algorithm
on a GPU: in [25] a locally constant weight assumption is used in the GPU
implementation to speed up the basic algorithm. In [26], a GPU extension of the
NLMeans algorithm is proposed to denoise ultrasound images. In this approach,
the maximum patch size is limited by the amount of shared memory of the GPU.
In this paper, we focus on algorithmic acceleration techniques for the GPU
without sacrificing denoising quality, i.e., the GPU implementation computes
the exact NLMeans formula, and without patch size restrictions imposed by
the hardware. To do so, we first review how NLMeans-based algorithms can be
mapped onto parallel processing architectures. We will find that the core ideas
of our NLMeans algoritmic acceleration techniques are directly applicable, but
the algorithms themselves need to be modified. By these modifications, we will
see that the resulting implementation can process DVD video in real-time on a
mid-range GPU. Next, as a second contribution of this paper, we explain how
to 100 ×
 
Search WWH ::




Custom Search