Stereo Vision Algorithms Suited to Constrained FPGA Cameras - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

5.4 Stereo Vision: Analysis of Memory Footprint and Bandwidth

Stereo vision algorithms are well-known for their demanding computational require-

ments that sometimes even do not enable their deployment in practical applications

with real-time constraints. This limitation in standard computing architecture such

as CPUs or GPUs is often concerned with number crunching capabilities. However,

when it comes to consider highly constrained computing architectures such as that

previously outlined, major limitations typically consist in the massive memory foot-

print and/or bandwidth requirements within the memory and the processing unit.

Let us consider these facts by analyzing the simplest stereo matching algorithm

that evaluates, within a prefixed disparity range D with disparity d

∈[

d min ,

d max ]

the matching costs C

computed, on a point basis, between each point in

the reference image at coordinate R

(

)

(

)

and each potential corresponding pixel

in the target image. Many effective cost functions

have been proposed in the literature and among these, the absolute difference of

pixel intensity (AD) or its truncated version, often referred to as truncated absolute

difference (TAD), that saturates the cost to an upper threshold T, Census transform

coupled with Hamming distance [ 47 ] and its variants such as the mini-Census [ 4 ]

or the more robust ternary based approach proposed [ 30 ] are widely adopted by

algorithms implemented into FPGAs. In fact, AD- and Census-based approaches,

compared to other cost functions such as squared differences (SD), normalized cross-

correlation (NCC) or zero-mean normalized cross-correlation (ZNCC), robust cost

functions computed on rectangular patches, or mutual information (MI) [ 41 ], are

certainly less demanding in terms of reconfigurable logic required for their hard-

ware implementation. In terms of robustness, the nonparametric local transform [ 47 ]

makes this approach robust to strong photometric variations, although in its original

formulation, it is quite noisy in uniformly textured region. Concerning AD, in order

to increase its robustness to photometric distortions that frequently occur in practical

application scenarios, a transformation that reduces the low-frequency components

(e.g., LoG (Laplacian of Gaussian ) or Sobel filter) is often applied to the stereo

pair before AD computation. For the reasons outlined so far, AD- and Census-based

approaches are frequently deployed by stereo vision algorithms implemented into

FPGAs. Sometimes, such as in [ 22 ], different cost functions (in [ 22 ], AD and Census)

are combined to increase robustness. Finally, there are approaches [ 37 ] that rely on

direct edge detection mechanism to improve computational efficiency. An exhaustive

review and evaluation of cost functions suited to practical stereo vision systems, not

restricted to FPGA implementation, can be found in [ 14 ].

Considering the previous example, from the memory point of view, stacking each

(

−

)

, d

∈[

d min ,

d max ]

for each point and for each disparity within the disparity range would

result in the 3D memory structure depicted in Fig. 5.3 and often referred to DSI

(Disparity Space Image). However, in most effective algorithms adopted in practi-

cal applications, the matching cost evaluated to determine the best disparity value

consists in aggregated pointwise matching costs C

(

)

(

)

, accumulated costs along

Advances in Embedded Computer Vision

Search WWH ::

Custom Search

Home