Matchmoving - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

assumed to lie on planar surfaces. The world coordinate system and its scale are

initialized by placing a calibration pattern with known dimensions in front of the

camera before it begins to move. The unknown depths of features in the environ-

ment are estimated with greater accuracy as the camera views them from different

positions. A smoothness prior that the camera moves with constant velocity and

angular velocity is imposed tomake the camera parameter estimation robust to video

segments that contain few features. We discuss probabilistic methods for state esti-

mation in more detail in the context of motion capture in Chapter 7 . The topic by

Thrun et al. [ 489 ] is an excellent reference for probabilistic robot localization, though

it does not emphasize vision-based SLAM.

While these techniques are promising, a key consideration is drift — that is, the

accumulation of errors as the sequence gets longer and longer. It's more likely that

production-quality real-time matchmoving is accomplished with a special-purpose

hardware system, as discussed further in Section 6.8 .

6.6.2

Large, Unordered Image Collections

With the advent of large Internet photo collections (e.g., user-contributed photos to

Flickr), it has become possible to use structure from motion techniques to recon-

struct the accurate 3D structure of a site and the corresponding camera positions

using only the images resulting from a simple keyword query. For example, a user

can download thousands of images resulting from the query “Statue of Liberty,” and

automatically obtain a fairly accurate 3Dmodel of the landmark. However, this prob-

lem differs from matchmoving in that the input images no longer have a natural

order. Additionally, the images may be taken in widely different positions and imag-

ing conditions (e.g., zoom, exposure, weather, illumination), requiring wide baseline

matching techniques of the type discussed in Chapter 4 . Finally, the sheer number of

the images makes bundle adjustment very challenging.

Snavely et al. [ 462 ] proposed a well-known system for the 3D exploration of Inter-

net photo collections called Photo Tourism , which combined a large-scale structure

frommotion problem with an intuitive user interface to browse through images and

camera positions. The system is especially effective for navigating images of tourist

sites that have been acquired by thousands of users from different perspectives and

in different viewing conditions. Rather than beginning with a projective reconstruc-

tion and upgrading it to a Euclidean one, Photo Tourism directly bundle adjusts over

each camera's external parameters and unknown focal length. SIFT feature matches

and tracks are estimated across the large image set, and fundamental matrices are

estimated for each pair of images that contain a sufficient number of matches. To

obtain an initial estimate for the bundle adjustment, the camera parameters for an

image pair with a large baseline, a large number of matches, and known focal lengths

are estimated. Then overlapping cameras and 3D points are incrementally added to

the system using resectioning, triangulation, and bundle adjustment.

Both the feature matching and bundle adjustment steps are extremely time-

consuming. Parallel processing, either on a multinode compute cluster [ 5 ] or a single

PC with multiple GPUs [ 152 ], can be used to accelerate feature matching. The main

trick is the careful assessment of which pairs of images are worthmatching, to obtain

clusters of images with similar appearance. Snavely et al. [ 463 ] described how to

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home