Architectures for Stereo Vision - Signal Processing Systems

Digital Signal Processing Reference

In-Depth Information

Tabl e 2 Performance results for the entire disparity

estimation algorithm using rank transform, semi-global

matching

and

median

filtering

Nvidia

Tesla

C2050 GPU

Image size

d range

128

256

640 × 480

9 . 7ms

16 . 0ms

29 . 0ms

1024 × 768

21 . 5ms

35 . 9ms

67 . 1ms

1280

7ms

Results are k-mean values over multiple runs and images

960

9ms

2 ms

105

3.7

Implementation Example: VLSI Architecture

for Semi-global Matching

In this section a parallelization scheme and corresponding VLSI architecture for

semi-global matching will be discussed. It is based on the works of [ 7 , 8 ] .

3.7.1

Parallelization

A crucial point for VLSI-implementation is the mapping of the algorithm into a

parallel-processable and stream-based flow that only requires a single-pass across

the input images. Further important aspects are regularity and locality of the

architecture that implements this flow [ 72 ] . Challenges are imposed by the semi-

global matching due to the recursively defined paths and their orientations within

theimages(seeSect. 2.3 ) , which are not aligned to a stream-based flow.

First, the two-dimensional parallelization concept that enables stream-based

processing will be introduced. Afterwards, an extension of the concept into the third

dimension is presented, what significantly increases processing speed and through-

put. The two-dimensional parallelization concept is shown in Fig. 13 and will be

presented for the path directions of 0 ◦ ,45 ◦ ,90 ◦ , 135 ◦ . Pixels are processed from

left to right along the image row (0 ◦ path). After processing pixel p − 1 =[

]

of the upper row, all path costs over d of all directions are available in the path

−

costs buffers z − 1 . Path costs are delayed, according to their path directions of

90 ◦ and 45 ◦ by one and two additional processing steps, respectively. Afterwards,

path costs of L 45 ◦ (

are available at

the output of the path cost buffers. These are exactly those path costs needed for

parallel and synchronous calculation of all path costs of all orientations for pixel

p 2 =[

−

)

, L 90 ◦ (

−

)

, L 135 ◦ (

−

)

. Synchronous calculation allows direct summation of path costs

in a pipeline that returns the aggregated costs S .

Therefore, all paths to the pixels p 1 =[

−

]

are calculated

in parallel in a single processing step. This concept is extendable to an arbitrary

number of rows. An additional delay by two pixels is introduced for each new row,

as illustrated in Fig. 13 . Images are separated into image slices of N parallel rows in

order to process whole images. Path costs of the last row of an image slice need to

be stored and made available to the first row of the next slice.

]

and p 2 =[

−

]

Signal Processing Systems

Search WWH ::

Custom Search

Home