Digital Signal Processing Reference
In-Depth Information
[ 48 ] presented a multi-resolution symmetric dynamic programming variant on a
GTX 295 reaching 14 fps for 2048
256 images. A total variation algorithm
with GPU implementation has been presented requiring between 15 and 60 s per
image [ 73 ] .
Variants of local methods examining the different techniques of adaptive weights
or adaptive support regions have received much attention. Recent local approaches
are census based with basic box filter cost aggregation [ 92 ] and a local truncated
laplacian kernel approximation with adaptive cost aggregation [ 44 ] . Locally adap-
tive support regions have been used and speeded up with bitwise voting in [ 50 ] .
Further work on local variants with adaptive cost aggregation methods includes
[ 45 , 63 ] and[ 40 ] . Instead of adaptive support regions on the input images [ 61 ]
use edge-preserving filtering on the matching costs. A comparison of six local
methods in terms of algorithmic and computational performance on GPUs has been
conducted [ 29 ] . A plane sweep algorithm with local depth connectivity in order to
retain depth discontinuities has been examined in [ 16 ] .
For SGM various implementations have been presented on a GeForce 8800 Ultra
[ 19 ] (0
×
2048
×
128), a Quadro FX5600 [ 27 ] , a GTX 280 without
[ 31 ] and with increased depth accuracy [ 67 ] , and on a Tesla C2050 [ 4 ] , which is
the highest performing implementation with 63 fps for 640
.
0057 fps at 640
×
480
×
128 images. This
allows a very interesting retrospective on the evolution of GPUs. Especially some of
the new features of Nvidia's compute capability 2.0 graphics cards allow radically
different parallelization schemes, which was exploited in [ 4 ] . We will have a detailed
look at this implementation in Sect. 3.6 . Furthermore, a combination of adaptive
support regions with a reduced version of SGM is proposed in [ 62 ] reaching 10 fps
for 450
×
480
×
×
375
×
64 images.
3.2
Dedicated Architectures (FPGA and VLSI)
For dedicated architectures targeting FPGAs or ASICs, local methods are often
favored because of potentially very small designs. This goes as far as to omit the cost
aggregation altogether despite the drawbacks in accuracy and robustness. Neverthe-
less, new cost aggregation concepts have also been investigated and incorporated in
hardware. In the following implementations without cost aggregation are indicated
with “w/o CA”.
Some examples of early architectures using SAD based matching w/o CA are
[ 2 , 54 , 64 ] . An SAD based stereo vision system with three cameras has been
presented in [ 98 ] . Depending on the emphasis of the referenced work, the results
vary in throughput and resolution up to 640
64 and 31 fps. The so-called Tyzx
ASIC for color-image census-based stereo-matching (w/o CA) achieves 200 fps for
512
×
480
×
480 images and 52 disparity levels [ 93 ] . It forms the basis of an extended
stereo vision system in [ 94 ] .
Also for recent implementations local methods with and without cost aggregation
are still popular. This includes [ 46 ] where a census transform (w/o CA) is employed
×
Search WWH ::




Custom Search