A GPU-Accelerated Real-Time NLMeans Algorithm for Denoising Color Video Sequences - Advanced Concepts for Intelligent Vision Systems

Information Technology Reference

In-Depth Information

the convolution operator as a cascade of a horizontal and vertical filter. Then by

setting

U (1)

( p )= U (1)

( p )= Y ( p ),

( p )= 0

, the first pass of our

algorithm is as follows:

⎛

⎞

U (4 i− 3)

( p )

⎝

⎠

U (4 i− 3)

( p )

(4 i− 3)

U (4 i− 3)

( p )=

(10)

U (4 i− 3)

( p )

,..., U (4 i− 3)

r (0 , 0)

p , q i

(4 i− 3)

3 ( p ) are simply passed to the

next step of the algorithm. We only compute the Euclidean distance between

two pixel intensities (in RGB color space). The next passes are given by:

(4 i− 3)

Note that the values

( p ) ,

( p ),

⎛

⎝

⎞

⎠

U (4 i− 2)

( p )

U (4 i− 2)

( p )

f (4 i− 2)

U (4 i− 2)

( p )=

U (4 i− 2)

( p )

Δx∈ [ −B,...,B ] U (4 i− 2)

,...,

( p x + Δx, p y ,p t )

⎛

⎞

(4 i− 1)

( p )

⎝

⎠

U (4 i− 1)

( p )

f (4 i− 1)

U (4 i− 1)

( p )=

. (11)

U (4 i− 1)

( p )

,..., U (4 i− 1)

g Δy∈ [ −B,...,B ] U

( p x ,p y + Δy, p t )

(4 i− 1)

The separable filtering reduces the computation complexity by a factor (2 B +

1) / 2. Fortunately, the steps (11) are computationally simple and only require

a small number regular memory accesses, which can benefit from the internal

memory caches of the GPU. Note that in the last step of (11), we already

computed the similarity weights, by evaluating the function g ( · ).

A second acceleration technique we presented in [19], is to exploit the sym-

metry property of the weights, i.e. w ( p

p + q i )= w ( p + q i ,

p ).Todoso,when

adding w ( p

p + q i ) Y ( p + q i ) to the image accumulation buffer at position

we proposed to additionally add w ( p

p + q i ) Y ( p ) to the image accumulation

buffer at position

p + q i ) only needs to be

computed once , effectively halving the size of the search window δ . However, this

acceleration technique requires “non-regular” writes to the accumulation buffer,

i.e., at position

p + q i . Consequently, the weight w ( p

as required by the structure of our GPU

program (4). Fortunately, our specific notation here brings a solution here: by

noting that

p + q i instead of

q i is constant in each pass, we could simply translate the input co-

ordinates and perform a “regular” write to the accumulation buffer. This way,

we need to add w ( p − q i ,

We will call this the translation technique. This gives us the next step of our

GPU algorithm:

p ) Y ( p − q i ) to the accumulation buffer at position

Advanced Concepts for Intelligent Vision Systems

Search WWH ::

Custom Search

Home