Digital Signal Processing Reference
In-Depth Information
Tabl e 2 Performance results for the entire disparity
estimation algorithm using rank transform, semi-global
matching
and
median
filtering
on
a
Nvidia
Tesla
C2050 GPU
Image size
d range
=
64
128
256
640 × 480
9 . 7ms
16 . 0ms
29 . 0ms
1024 × 768
21 . 5ms
35 . 9ms
67 . 1ms
×
.
.
.
1280
7ms
Results are k-mean values over multiple runs and images
960
32
9ms
56
2 ms
105
3.7
Implementation Example: VLSI Architecture
for Semi-global Matching
In this section a parallelization scheme and corresponding VLSI architecture for
semi-global matching will be discussed. It is based on the works of [ 7 , 8 ] .
3.7.1
Parallelization
A crucial point for VLSI-implementation is the mapping of the algorithm into a
parallel-processable and stream-based flow that only requires a single-pass across
the input images. Further important aspects are regularity and locality of the
architecture that implements this flow [ 72 ] . Challenges are imposed by the semi-
global matching due to the recursively defined paths and their orientations within
theimages(seeSect. 2.3 ) , which are not aligned to a stream-based flow.
First, the two-dimensional parallelization concept that enables stream-based
processing will be introduced. Afterwards, an extension of the concept into the third
dimension is presented, what significantly increases processing speed and through-
put. The two-dimensional parallelization concept is shown in Fig. 13 and will be
presented for the path directions of 0 ,45 ,90 , 135 . Pixels are processed from
left to right along the image row (0 path). After processing pixel p 1 =[
]
of the upper row, all path costs over d of all directions are available in the path
x
1
,
y
costs buffers z 1 . Path costs are delayed, according to their path directions of
90 and 45 by one and two additional processing steps, respectively. Afterwards,
path costs of L 45 (
are available at
the output of the path cost buffers. These are exactly those path costs needed for
parallel and synchronous calculation of all path costs of all orientations for pixel
p 2 =[
x
3
,
y
,
d
)
, L 90 (
x
2
,
y
,
d
)
, L 135 (
x
1
,
y
,
d
)
. Synchronous calculation allows direct summation of path costs
in a pipeline that returns the aggregated costs S .
Therefore, all paths to the pixels p 1 =[
x
2
,
y
+
1
]
are calculated
in parallel in a single processing step. This concept is extendable to an arbitrary
number of rows. An additional delay by two pixels is introduced for each new row,
as illustrated in Fig. 13 . Images are separated into image slices of N parallel rows in
order to process whole images. Path costs of the last row of an image slice need to
be stored and made available to the first row of the next slice.
x
,
y
]
and p 2 =[
x
2
,
y
+
1
]
 
 
 
Search WWH ::




Custom Search