Graphics Reference
In-Depth Information
Normalized cross-correlations are computed between a window of pixels around
p in the reference image and the windows in the neighborhood images implied by
the hypothesized depth d
. If the candidate depth is correct, we expect all of the
normalized cross-correlation values to be high; thus, the depth is accepted if these
values are above a threshold for enough of the neighborhood images. On the other
hand, if all depths are incorrect or some depth is correct but several images contain
specularities or occlusions, no d
(
p
)
is estimated at p . Points that are assigned depths
are also given confidences in the depth estimates based on the average normalized
cross-correlation values of the neighbors that contributed. The higher the values,
and the more neighbors that agreed, the higher the confidence in the depth estimate.
Themultiple depthmaps are thenmerged using Curless and Levoy's VRIP algorithm,
discussed in detail in Section 8.4.3 . As with patch-based methods, the resulting 3D
reconstructionsmay containholes in low-confidence regions, which canbe smoothly
interpolated if desired.
In contrast to the methods discussed in Section 5.5 , depth map computation for
multi-view stereo pairs is usually fairly unsophisticated, often using simple normal-
ized cross-correlation instead of a more geometrically or photometrically natural
measure. The rationale is that themerging algorithmshould take care of outlier rejec-
tion, especially when there is substantial redundancy in the source images. On the
other hand, when there are few source images, some per-pair outlier rejection prior
to depth map fusion can obtain better results (e.g., see Campbell et al. [ 80 ]). Another
approach is to evaluate normalized cross-correlations between a square window in
the reference image and rectangularwindows of different widths in the neighborhood
images, to account for perspective distortion [ 62 ].
(
p
)
8.3.5
Space-Time Stereo
Finally, we briefly describe space-time stereo approaches, which are a hybrid
betweenmulti-view stereo and structured lighting algorithms. The basic observation
is that stereo algorithms performpoorly in regions with little texture; thus, a projector
is used to introduce artificial texture into the scene in the form of a high-frequency
pattern. This texture gives the stereo algorithms something to “grab onto.” The main
difference from structured light techniques is that the projector is not calibrated; the
pattern is only used to introduce texture, as opposed to coding the precise horizontal
position of a stripe.
The space-time stereo concept was proposed at about the same time by Zhang et
al. [ 569 ] andDavis et al. [ 114 ]. We beginwith a normal window-based stereomatching
cost function between two rectified images I 1 and I 2 ,
C
(
x 0 , y 0 , d
) =
e
(
I 2
(
x
d , y
)
, I 1
(
x , y
))
(8.12)
(
x , y
) W
where
, d is a candidate disparity,
and e is some pixel-to-pixel cost function (e.g., the absolute or squared distance).
The key idea is to extend Equation ( 8.12 ) to a space-time-window-based matching
function
W
is awindowcentered at the pixel of interest
(
x 0 , y 0 )
C
(
x 0 , y 0 , t 0 , d
) =
e
(
I 2
(
x
d , y , t
)
, I 1
(
x , y , t
))
(8.13)
t
T
(
x , y
) W
Search WWH ::




Custom Search