Graphics Reference
In-Depth Information
assumed to lie on planar surfaces. The world coordinate system and its scale are
initialized by placing a calibration pattern with known dimensions in front of the
camera before it begins to move. The unknown depths of features in the environ-
ment are estimated with greater accuracy as the camera views them from different
positions. A smoothness prior that the camera moves with constant velocity and
angular velocity is imposed tomake the camera parameter estimation robust to video
segments that contain few features. We discuss probabilistic methods for state esti-
mation in more detail in the context of motion capture in Chapter 7 . The topic by
Thrun et al. [ 489 ] is an excellent reference for probabilistic robot localization, though
it does not emphasize vision-based SLAM.
While these techniques are promising, a key consideration is drift — that is, the
accumulation of errors as the sequence gets longer and longer. It's more likely that
production-quality real-time matchmoving is accomplished with a special-purpose
hardware system, as discussed further in Section 6.8 .
6.6.2
Large, Unordered Image Collections
With the advent of large Internet photo collections (e.g., user-contributed photos to
Flickr), it has become possible to use structure from motion techniques to recon-
struct the accurate 3D structure of a site and the corresponding camera positions
using only the images resulting from a simple keyword query. For example, a user
can download thousands of images resulting from the query “Statue of Liberty,” and
automatically obtain a fairly accurate 3Dmodel of the landmark. However, this prob-
lem differs from matchmoving in that the input images no longer have a natural
order. Additionally, the images may be taken in widely different positions and imag-
ing conditions (e.g., zoom, exposure, weather, illumination), requiring wide baseline
matching techniques of the type discussed in Chapter 4 . Finally, the sheer number of
the images makes bundle adjustment very challenging.
Snavely et al. [ 462 ] proposed a well-known system for the 3D exploration of Inter-
net photo collections called Photo Tourism , which combined a large-scale structure
frommotion problem with an intuitive user interface to browse through images and
camera positions. The system is especially effective for navigating images of tourist
sites that have been acquired by thousands of users from different perspectives and
in different viewing conditions. Rather than beginning with a projective reconstruc-
tion and upgrading it to a Euclidean one, Photo Tourism directly bundle adjusts over
each camera's external parameters and unknown focal length. SIFT feature matches
and tracks are estimated across the large image set, and fundamental matrices are
estimated for each pair of images that contain a sufficient number of matches. To
obtain an initial estimate for the bundle adjustment, the camera parameters for an
image pair with a large baseline, a large number of matches, and known focal lengths
are estimated. Then overlapping cameras and 3D points are incrementally added to
the system using resectioning, triangulation, and bundle adjustment.
Both the feature matching and bundle adjustment steps are extremely time-
consuming. Parallel processing, either on a multinode compute cluster [ 5 ] or a single
PC with multiple GPUs [ 152 ], can be used to accelerate feature matching. The main
trick is the careful assessment of which pairs of images are worthmatching, to obtain
clusters of images with similar appearance. Snavely et al. [ 463 ] described how to
Search WWH ::




Custom Search