Information Technology Reference
In-Depth Information
the convolution operator as a cascade of a horizontal and vertical filter. Then by
setting
U (1)
1
U (1)
2
( p )= U (1)
3
( p )= U (1)
4
( p )= Y ( p ),
( p )= 0
, the first pass of our
algorithm is as follows:
U (4 i− 3)
1
( p )
U (4 i− 3)
2
( p )
(4 i− 3)
U (4 i− 3)
1
f
( p )=
.
(10)
U (4 i− 3)
3
( p )
,..., U (4 i− 3)
4
r (0 , 0)
2
p , q i
(4 i− 3)
3 ( p ) are simply passed to the
next step of the algorithm. We only compute the Euclidean distance between
two pixel intensities (in RGB color space). The next passes are given by:
(4 i− 3)
1
(4 i− 3)
2
Note that the values
U
( p ) ,
U
( p ),
U
U (4 i− 2)
1
( p )
U (4 i− 2)
2
( p )
f (4 i− 2)
U (4 i− 2)
1
( p )=
,
U (4 i− 2)
3
U (4 i− 2)
4
( p )
Δx∈ [ −B,...,B ] U (4 i− 2)
,...,
( p x + Δx, p y ,p t )
4
(4 i− 1)
1
U
( p )
U (4 i− 1)
2
( p )
f (4 i− 1)
U (4 i− 1)
1
( p )=
. (11)
U (4 i− 1)
3
( p )
,..., U (4 i− 1)
4
g Δy∈ [ −B,...,B ] U
( p x ,p y + Δy, p t )
(4 i− 1)
4
The separable filtering reduces the computation complexity by a factor (2 B +
1) / 2. Fortunately, the steps (11) are computationally simple and only require
a small number regular memory accesses, which can benefit from the internal
memory caches of the GPU. Note that in the last step of (11), we already
computed the similarity weights, by evaluating the function g ( · ).
A second acceleration technique we presented in [19], is to exploit the sym-
metry property of the weights, i.e. w ( p
,
p + q i )= w ( p + q i ,
p ).Todoso,when
adding w ( p
,
p + q i ) Y ( p + q i ) to the image accumulation buffer at position
p
,
we proposed to additionally add w ( p
,
p + q i ) Y ( p ) to the image accumulation
buffer at position
p + q i ) only needs to be
computed once , effectively halving the size of the search window δ . However, this
acceleration technique requires “non-regular” writes to the accumulation buffer,
i.e., at position
p + q i . Consequently, the weight w ( p
,
as required by the structure of our GPU
program (4). Fortunately, our specific notation here brings a solution here: by
noting that
p + q i instead of
p
q i is constant in each pass, we could simply translate the input co-
ordinates and perform a “regular” write to the accumulation buffer. This way,
we need to add w ( p q i ,
.
We will call this the translation technique. This gives us the next step of our
GPU algorithm:
p ) Y ( p q i ) to the accumulation buffer at position
p
 
Search WWH ::




Custom Search