Computer Vision for Micro Air Vehicles - Advances in Embedded Computer Vision

Graphics Reference

In-Depth Information

stereo method to determine a 3D model of the scene beneath the MAV. We present a

frame list approach with variable baseline which enables arbitrarily selection of depth

accuracy of the 3D model as long as the motion between an image pair could be found

correctly. The scale can be determined from any metric pose estimator or altitude

sensor. Here, we use the pose estimator presented in Sect. 4.2 . Second, we analyze

the 3D model in order to find potential landing candidates. Most of the mentioned

work uses the dimensions of the MAV, the size, the planarity, and slope of the landing

spot as main criterion of landability. We reduce all these criterions to simple steps

which enable efficient onboard implementation. Third, we pick the most promising

candidate and approach it, e.g., with a two-waypoint trajectory. Figure 4.17 illustrates

the processing pipeline of our autonomous landing approach. Besides experiments

where we actually land autonomously in a controlled environment, we present more

detailed analysis about the system performance with hand-labeled ground truth data.

4.4.2.1 3D Reconstruction

Dense motion stereo is based on the same principle as conventional stereo, with

the difference that the two views of the captured scene are generated by a single

moving camera instead of a rigid stereo bar. The extrinsic parameters (rotation R and

translation t between the two camera positions) have to be determined for each image

pair individually. Translation can be estimated up to scale using visual information

only. We assume the intrinsic parameters do not change and calibrate them in advance.

We use a CAHVORE camera model [ 17 ] to model lens effects and to generate

linearized camera models that describe the perspective projection.

For selection of a proper image pair, we maintain a frame list of the last n images.

Each element of the frame list consists of camera image, camera pose in the world

frame, extracted features (STAR [ 1 ], MSURF [ 9 ]), and a feature track list to record

how often each feature has been found in the frame list. Given this data, we can select

image pairs using two criteria. First, since depth accuracy is a function of the stereo

baseline, we look for images that are an appropriate distance apart to achieve enough

depth accuracy (at ground level) at the current altitude of the MAV. Second, we chose

the image which exceeds a minimum number of successive feature matches with the

current image. As soon as an image pair is found, we estimate R and t between

the images with a multiplanar homography alignment approach [ 12 ]. Since we can

estimate translation only up to scale from pure visual information (without some

metric context), the translation vector is then scaled with the real-world baseline

from the visual-inertial state estimator described in Sect. 4.2 .Having R and t ,stereo

rectification can be applied. The quality of the motion estimation strongly depends

on the accuracy of the feature locations and, thus, is scene dependent. To discard

poor motion estimates in order to prevent wrong 3D reconstruction, we calculate the

average 3D reprojection error of the feature pairs and accept only image pairs with

an error in subpixel range. Finally, we use a real-time sum of absolute difference

stereo matching algorithm to estimate a disparity map from which we generate a 3D

point cloud to model the captured scene beneath the MAV.

Search WWH ::

Custom Search

Home