Architectures for Stereo Vision - Signal Processing Systems - page 489

Digital Signal Processing Reference

In-Depth Information

Fig. 8

Performance of the

3

3 median filter: on

1280

×

960 images as the

parallelization configuration

changes. Block width is fixed

to tdx

×

32. Best performance

is achieved with

tdx

=

×

tdy

=

32

×

4and n ppt

=

4

Fig. 9

3 median filter: comparison of the texture memory kernel and the

proposed shared memory kernel on a Tesla C2050 GPU for the best-performing parallelization

configuration

Performance of the 3

×

The median filter is always compute bound and performs best with tdx

×

tdy

=

32

×

4 threads and n ppt =

4. The results of the parameter study for tdx

=

32 are shown

in Fig. 8 . Configurations with n ppt =

8 perform slightly worse although redundant

memory access is further reduced because of inefficient pipeline utilization. Pro-

cessing times for a 3

3 median filter (i.e. kernel radius K =

×

1) are given in Fig. 9

resulting in 0

64 ms for the new shared memory based kernel. For a texture-memory

based kernel, which is the most often suggested way of implementing a 2D non-

separable filter, processing time is which is 2

.

.

77 ms. In comparison, this yields a

speed-up of 4

.

3 when processing a 1280

×

960 image.

9 rank transform (i.e. K =

For a 9

×

4) experiments showed that a block size

of tdx

×

tdy

=

32

×

4 with n ppt =

4 yields best performance. A speed up of 4

.

0is

obtained switching from the texture-based kernel (3

.

13 ms) to the shared memory

kernel (0

.

78 ms) for 1280

×

960 images.

Next Page

Signal Processing Systems

Search WWH ::

Custom Search

Home