Consumer Robotics: A Platform for Embedding Computer Vision in Everyday Life - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

the following sections. The work of Cummins et al. [ 2 ] takes a more sophisticated

approach to recognition, building a visual vocabulary offline, and approximating

the joint probability distribution of visual words with a Chow-Liu tree. Each view's

appearance model is updated upon recognition.

Our view recognition front end bears many similarities to the view-based maps

of Konolige et al. [ 12 ]. That system constructs views from stereo images and per-

forms two-step recognition using first a vocabulary tree and then a geometric match-

ing stage. Views (called skeleton frames ) are constructed from the output of visual

odometry, which requires a frame rate sufficient for tracking. We require onlymonoc-

ular imagery, constructing structured appearance models from two matched views

of the same scene. While Konolige et al. use randomized tree signatures for feature

matching, we use a simple variant on SIFT features and local and global feature

databases.

2.2.2 Graph-Based SLAM

Storing observations and poses in a constraint graph is now a well-explored technique

for localization and mapping. The graph formulation provides a straightforward and

flexible representation of the underlying Gaussian Markov random field (GMRF)

problem that SLAM attempts to solve. The general framework is described in [ 18 ],

including a description of a graph relaxation procedure identical to batch bundle

adjustment in photogrammetry [ 19 ]. Relaxation algorithms for SLAM graphs have

received much attention, especially with online operation in mind. Olson et al. [ 16 ]

suggest a stochastic gradient descent method, and Grisetti et al. [ 7 ] review that and

related methods for incremental graph optimization.

The system of Eade and Drummond [ 3 ] forms a graph where each node is a joint

distribution over a local map, and the relative nonlinear constraints between nodes

are derived from shared features. The graph is relaxed by imposing cycle constraints

using preconditioned gradient descent. The network constructed by PTAM is effec-

tively a graph of relative constraints between keyframes, though the optimization,

performed asynchronously to the primary tracking task, acts on individual structure

elements.

The view-based mapping of Konolige et al. [ 12 ] constructs a reduced graph of

poses by consolidating consecutive frames tracked by visual odometry into skeleton

frames. Then the constraint graph over skeleton frames is incrementally relaxed using

the Toro method [ 8 ].

While existing graph-based SLAM methods employ incremental graph optimiz-

ers to allow online operation, the number of poses in the graph continues to growwith

time. One technique suggested for bounding this growth is that the robot be occasion-

ally virtually “kidnapped,” disconnecting its current pose in the graph from previous

poses and re-inserting it in using only recent observations [ 8 ]. This assumes both that

the recent observations are sufficiently accurate to allow relocalization, and that the

effective uncertainty of these observations is zero. These assumptions are routinely

Search WWH ::

Custom Search

Home