Three-Dimensional Data Acquisition - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

where

is a temporal window of frames centered at the current time instant t 0 .

The basic observation is that the spatio-temporal profile centered at a pixel is more

discriminative than the spatial neighborhood alone, which is the same concept that

underlies the space-time analysis of structured light patterns illustrated inFigure 8.14 .

The number of frames in

T

and they way they are chosen is up to the user, and could

depend on the speed of motion in the scene; in the same way, the size of the window

W

T

depends on the image resolution and amount of scene texture.

The advantage of the space-time framework is that it's well suited to dynamic

scenes, since accurate correspondences may be difficult to intuit on a frame-by-

frame basis. Zhang et al. accounted for surfaces that are non-fronto-parallel and/or

moving using a linear prediction of each pixel's changing disparity (i.e., the spatio-

temporal windows are “slanted” along all three axes, not rectangular solids). They

later introduced further constraints to enforce spatio-temporal consistency in the

estimates by formulating the optimization globally [ 570 ].

Since the projector isn't calibrated against the cameras and the pattern isn't coded,

any pattern with high-frequency spatial detail will do. Zhang et al. [ 569 ] used shuffled

Gray codes and binary checkerboards, while Davis et al. [ 114 ] used random patterns

of binary vertical stripes and even a simple flashlight. Both groups based the cost

function C in Equation ( 8.13 ) simply on the sum of squared differences.

8.4

REGISTERING 3D DATASETS

3D data acquired using LiDAR or structured light from a single point of view suffers

from the shadowing problem illustrated in Figure 8.3 . That is, we only get a depth

estimate at a given pixel for the corresponding scene surface closest to the camera.

Therefore, we commonly move the scanner around the scene to acquire scans from

viewpoints that fill in the gaps and make the 3Dmodel more complete.

In this section, we address two key problems associated with this process. The first

is howtoalignmultiple 3Ddatasets into the same coordinate system.We take a similar

approach to the problem of 2D image alignment: features in each scan are detected,

matched, and used as the basis for estimating a parametric transformation between

each scan pair. However, in 3D we need different methods for feature detection and

registration, as we discuss in Sections 8.4.1 and 8.4.2 .

Once we have a method for aligning scans, the second problem is how to cre-

ate a usable triangular mesh from the resulting collection of points. Algorithms for

these problems of multiscan fusion and meshing are overviewed in Section 8.4.3 .

Throughout this section, wemotivate the algorithms using data acquired fromLiDAR

scanners, but the same methods apply to point clouds created from structured light

or multi-view stereo.

8.4.1

Feature Detection and Matching

The goal of feature detection and matching in 3D is the same as in 2D: to find regions

of a scan that can be reliably, unambiguously matched with scans of the same scene

from different perspectives. These feature matches can subsequently be used to ini-

tializeor aid in registration, as described in thenext section.However, thenatureof 3D

data requires us to rethink the criteria for what makes a “good” feature. Figure 8.29 a,

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home