A GPU-Accelerated Real-Time NLMeans Algorithm for Denoising Color Video Sequences - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

In our previous work [4], we proposed a number of improvements to the

NLMeans filter, for denoising grayscale still images. Some of these improvements

which are relevant for this paper are:

- An extension of the NLMeans to correlated noise: even though the origi-

nal NLMeans filter relies on a white Gaussian noise assumption, the power

spectral densities of noise in real images and video sequences is rarely flat

(see [23]).

- Acceleration techniques that exploit the symmetry in the weight computa-

tion and that compute the Euclidean distance between patches by a recursive

moving average filter. By these accelerations, the computation time can be

reduced by a factor 121 (for 11 × 11 patches), without sacrifying image quality

at all!

In spite of efforts by many researchers and also our recent improvements, the

NLMeans algorithm is not well suited for real-time denoising of video sequences

on a CPU. Using our improvements, denoising one 512 × 512 color image takes

about 30 sec. for a modestly optimized C++ implementation on a recent 2GHz

CPU (single-threaded implementation). Consequently this technique is not ap-

plicable to e.g. real-time video communication.

Nowadays, there is a trend toward the use of parallel processing architectures

in order to accelerate the processing. One example of such architecture is the

graphical processing unit (GPU). Although the GPU is primarily designed for

the rendering of 3D scenes, advances of the GPU in the late 90's enabled many

researchers and engineers to use the GPU for more general computations. This

led to the so-called GPGPU (General-Purpose computations on GPUs) [24] and

many approaches (e.g. based on OpenGL, DirectX, CUDA, OpenCL, ...) exist

to achieve GPGPU with existing GPU hardware. Also because the processing

power of modern GPUs has tremendously increased in the last decade (even for

inexpensive GPUs a speed-up of a factor 20 ×

can be expected) and is

even more improving, it becomes worthwhile to investigate which video denoising

methods can eciently be implemented on a GPU.

Recently, a number of authors have implemented the NLMeans algorithm

on a GPU: in [25] a locally constant weight assumption is used in the GPU

implementation to speed up the basic algorithm. In [26], a GPU extension of the

NLMeans algorithm is proposed to denoise ultrasound images. In this approach,

the maximum patch size is limited by the amount of shared memory of the GPU.

In this paper, we focus on algorithmic acceleration techniques for the GPU

without sacrificing denoising quality, i.e., the GPU implementation computes

the exact NLMeans formula, and without patch size restrictions imposed by

the hardware. To do so, we first review how NLMeans-based algorithms can be

mapped onto parallel processing architectures. We will find that the core ideas

of our NLMeans algoritmic acceleration techniques are directly applicable, but

the algorithms themselves need to be modified. By these modifications, we will

see that the resulting implementation can process DVD video in real-time on a

mid-range GPU. Next, as a second contribution of this paper, we explain how

to 100 ×

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home