Three-Dimensional Data Acquisition - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

Normalized cross-correlations are computed between a window of pixels around

p in the reference image and the windows in the neighborhood images implied by

the hypothesized depth d

. If the candidate depth is correct, we expect all of the

normalized cross-correlation values to be high; thus, the depth is accepted if these

values are above a threshold for enough of the neighborhood images. On the other

hand, if all depths are incorrect or some depth is correct but several images contain

specularities or occlusions, no d

(

)

is estimated at p . Points that are assigned depths

are also given confidences in the depth estimates based on the average normalized

cross-correlation values of the neighbors that contributed. The higher the values,

and the more neighbors that agreed, the higher the confidence in the depth estimate.

Themultiple depthmaps are thenmerged using Curless and Levoy's VRIP algorithm,

discussed in detail in Section 8.4.3 . As with patch-based methods, the resulting 3D

reconstructionsmay containholes in low-confidence regions, which canbe smoothly

interpolated if desired.

In contrast to the methods discussed in Section 5.5 , depth map computation for

multi-view stereo pairs is usually fairly unsophisticated, often using simple normal-

ized cross-correlation instead of a more geometrically or photometrically natural

measure. The rationale is that themerging algorithmshould take care of outlier rejec-

tion, especially when there is substantial redundancy in the source images. On the

other hand, when there are few source images, some per-pair outlier rejection prior

to depth map fusion can obtain better results (e.g., see Campbell et al. [ 80 ]). Another

approach is to evaluate normalized cross-correlations between a square window in

the reference image and rectangularwindows of different widths in the neighborhood

images, to account for perspective distortion [ 62 ].

(

)

8.3.5

Space-Time Stereo

Finally, we briefly describe space-time stereo approaches, which are a hybrid

betweenmulti-view stereo and structured lighting algorithms. The basic observation

is that stereo algorithms performpoorly in regions with little texture; thus, a projector

is used to introduce artificial texture into the scene in the form of a high-frequency

pattern. This texture gives the stereo algorithms something to “grab onto.” The main

difference from structured light techniques is that the projector is not calibrated; the

pattern is only used to introduce texture, as opposed to coding the precise horizontal

position of a stripe.

The space-time stereo concept was proposed at about the same time by Zhang et

al. [ 569 ] andDavis et al. [ 114 ]. We beginwith a normal window-based stereomatching

cost function between two rectified images I 1 and I 2 ,

(

x 0 , y 0 , d

) =

(

I 2

(

−

d , y

)

, I 1

(

x , y

))

(8.12)

(

x , y

) ∈ W

where

, d is a candidate disparity,

and e is some pixel-to-pixel cost function (e.g., the absolute or squared distance).

The key idea is to extend Equation ( 8.12 ) to a space-time-window-based matching

function

is awindowcentered at the pixel of interest

(

x 0 , y 0 )

(

x 0 , y 0 , t 0 , d

) =

(

I 2

(

−

d , y , t

)

, I 1

(

x , y , t

))

(8.13)

∈ T

(

x , y

) ∈ W

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home