Image Processing Reference
In-Depth Information
age as a long one-dimensional array of pixels; that's how it is stored in memory anyway. Now
we can rewrite the above algorithm as such (see Pseudo-code 2 ) .
Pseudo-code 2
foreach pixel p in f do
x = p.getXcoordinates()
y = p.getYcoordinates()
Out = 0
for s = -a to a do
for t = -b to b do
// calculate the weighted sum
Out = Out + f (x + s, y + t) * w(s,t)
g(x,y) = Out
The above algorithm collapses the two outer loops. Now, using Cuda's built-in thread and
block identifiers, we can launch N × M threads where each thread will be processing a single
pixel and thus eliminating the expensive two outer loops. Note that the two inner loops are not
computationally expensive and can be left alone. So the revised parallel algorithm will look
like Pseudo-code 3:
Pseudo-code 3
x = getPixel( threadID ).getXcoordinates()
y = getPixel( threadID ).getYcoordinates()
Out = 0
for s = -a to a do
for t = -b to b do
// calculate the weighted sum
Out = Out + f (x + s, y + t) * w(s,t)
g(x,y) = Out
The same CUDA function can be called for different filtering effects by passing a reference
to a filter. Since the filters do not change, they can be placed in Constant memory for decreased
access time. The Cuda kernel is shown below in Code 1. The filter is placed in Constant
memory for increased performance. *ptr is a pointer to the input image, and *result is a pointer
to the output image. Since this kernel is invoked on RGB images, the filter is applied on all
three colors and the result is constrained to values from 0 to 255 using our T() function.
4 Applications for surveillance using parallel video
processing
Based on the topics described earlier, we can implement several algorithms for practical sur-
veillance applications. These algorithms that we will describe can run serially, but for im-
 
 
Search WWH ::




Custom Search