Image Processing Reference

In-Depth Information

age as a long one-dimensional array of pixels; that's how it is stored in memory anyway. Now

we can rewrite the above algorithm as such (see
Pseudo-code 2
)
.

Pseudo-code 2

foreach
pixel p
in
f
do

x = p.getXcoordinates()

y = p.getYcoordinates()

Out = 0

for
s = -a
to
a
do

for
t = -b
to
b
do

// calculate the weighted sum

Out = Out +
f
(x + s, y + t) * w(s,t)

g(x,y) = Out

The above algorithm collapses the two outer loops. Now, using Cuda's built-in thread and

block identifiers, we can launch
N
×
M
threads where each thread will be processing a single

pixel and thus eliminating the expensive two outer loops. Note that the two inner loops are not

computationally expensive and can be left alone. So the revised parallel algorithm will look

like Pseudo-code 3:

Pseudo-code 3

x = getPixel(
threadID
).getXcoordinates()

y = getPixel(
threadID
).getYcoordinates()

Out = 0

for
s = -a
to
a
do

for
t = -b
to
b
do

// calculate the weighted sum

Out = Out +
f
(x + s, y + t) * w(s,t)

g(x,y) = Out

The same CUDA function can be called for different filtering effects by passing a reference

to a filter. Since the filters do not change, they can be placed in Constant memory for decreased

access time. The Cuda kernel is shown below in Code 1. The filter is placed in Constant

memory for increased performance.
*ptr
is a pointer to the input image, and
*result
is a pointer

to the output image. Since this kernel is invoked on RGB images, the filter is applied on all

three colors and the result is constrained to values from 0 to 255 using our
T()
function.

4 Applications for surveillance using parallel video

processing

Based on the topics described earlier, we can implement several algorithms for practical sur-

veillance applications. These algorithms that we will describe can run serially, but for im-

Search WWH ::

Custom Search