Graphics Reference
In-Depth Information
5.4 Stereo Vision: Analysis of Memory Footprint and Bandwidth
Stereo vision algorithms are well-known for their demanding computational require-
ments that sometimes even do not enable their deployment in practical applications
with real-time constraints. This limitation in standard computing architecture such
as CPUs or GPUs is often concerned with number crunching capabilities. However,
when it comes to consider highly constrained computing architectures such as that
previously outlined, major limitations typically consist in the massive memory foot-
print and/or bandwidth requirements within the memory and the processing unit.
Let us consider these facts by analyzing the simplest stereo matching algorithm
that evaluates, within a prefixed disparity range D with disparity d
∈[
d min ,
d max ]
,
the matching costs C
computed, on a point basis, between each point in
the reference image at coordinate R
(
x
,
y
,
d
)
(
x
,
y
)
and each potential corresponding pixel
T
in the target image. Many effective cost functions
have been proposed in the literature and among these, the absolute difference of
pixel intensity (AD) or its truncated version, often referred to as truncated absolute
difference (TAD), that saturates the cost to an upper threshold T, Census transform
coupled with Hamming distance [ 47 ] and its variants such as the mini-Census [ 4 ]
or the more robust ternary based approach proposed [ 30 ] are widely adopted by
algorithms implemented into FPGAs. In fact, AD- and Census-based approaches,
compared to other cost functions such as squared differences (SD), normalized cross-
correlation (NCC) or zero-mean normalized cross-correlation (ZNCC), robust cost
functions computed on rectangular patches, or mutual information (MI) [ 41 ], are
certainly less demanding in terms of reconfigurable logic required for their hard-
ware implementation. In terms of robustness, the nonparametric local transform [ 47 ]
makes this approach robust to strong photometric variations, although in its original
formulation, it is quite noisy in uniformly textured region. Concerning AD, in order
to increase its robustness to photometric distortions that frequently occur in practical
application scenarios, a transformation that reduces the low-frequency components
(e.g., LoG (Laplacian of Gaussian ) or Sobel filter) is often applied to the stereo
pair before AD computation. For the reasons outlined so far, AD- and Census-based
approaches are frequently deployed by stereo vision algorithms implemented into
FPGAs. Sometimes, such as in [ 22 ], different cost functions (in [ 22 ], AD and Census)
are combined to increase robustness. Finally, there are approaches [ 37 ] that rely on
direct edge detection mechanism to improve computational efficiency. An exhaustive
review and evaluation of cost functions suited to practical stereo vision systems, not
restricted to FPGA implementation, can be found in [ 14 ].
Considering the previous example, from the memory point of view, stacking each
(
x
d
,
y
)
, d
∈[
d min ,
d max ]
C
for each point and for each disparity within the disparity range would
result in the 3D memory structure depicted in Fig. 5.3 and often referred to DSI
(Disparity Space Image). However, in most effective algorithms adopted in practi-
cal applications, the matching cost evaluated to determine the best disparity value
consists in aggregated pointwise matching costs C
(
x
,
y
,
d
)
(
x
,
y
,
d
)
, accumulated costs along
Search WWH ::




Custom Search