Graphics Reference
In-Depth Information
stereo method to determine a 3D model of the scene beneath the MAV. We present a
frame list approach with variable baseline which enables arbitrarily selection of depth
accuracy of the 3D model as long as the motion between an image pair could be found
correctly. The scale can be determined from any metric pose estimator or altitude
sensor. Here, we use the pose estimator presented in Sect. 4.2 . Second, we analyze
the 3D model in order to find potential landing candidates. Most of the mentioned
work uses the dimensions of the MAV, the size, the planarity, and slope of the landing
spot as main criterion of landability. We reduce all these criterions to simple steps
which enable efficient onboard implementation. Third, we pick the most promising
candidate and approach it, e.g., with a two-waypoint trajectory. Figure 4.17 illustrates
the processing pipeline of our autonomous landing approach. Besides experiments
where we actually land autonomously in a controlled environment, we present more
detailed analysis about the system performance with hand-labeled ground truth data.
4.4.2.1 3D Reconstruction
Dense motion stereo is based on the same principle as conventional stereo, with
the difference that the two views of the captured scene are generated by a single
moving camera instead of a rigid stereo bar. The extrinsic parameters (rotation R and
translation t between the two camera positions) have to be determined for each image
pair individually. Translation can be estimated up to scale using visual information
only. We assume the intrinsic parameters do not change and calibrate them in advance.
We use a CAHVORE camera model [ 17 ] to model lens effects and to generate
linearized camera models that describe the perspective projection.
For selection of a proper image pair, we maintain a frame list of the last n images.
Each element of the frame list consists of camera image, camera pose in the world
frame, extracted features (STAR [ 1 ], MSURF [ 9 ]), and a feature track list to record
how often each feature has been found in the frame list. Given this data, we can select
image pairs using two criteria. First, since depth accuracy is a function of the stereo
baseline, we look for images that are an appropriate distance apart to achieve enough
depth accuracy (at ground level) at the current altitude of the MAV. Second, we chose
the image which exceeds a minimum number of successive feature matches with the
current image. As soon as an image pair is found, we estimate R and t between
the images with a multiplanar homography alignment approach [ 12 ]. Since we can
estimate translation only up to scale from pure visual information (without some
metric context), the translation vector is then scaled with the real-world baseline
from the visual-inertial state estimator described in Sect. 4.2 .Having R and t ,stereo
rectification can be applied. The quality of the motion estimation strongly depends
on the accuracy of the feature locations and, thus, is scene dependent. To discard
poor motion estimates in order to prevent wrong 3D reconstruction, we calculate the
average 3D reprojection error of the feature pairs and accept only image pairs with
an error in subpixel range. Finally, we use a real-time sum of absolute difference
stereo matching algorithm to estimate a disparity map from which we generate a 3D
point cloud to model the captured scene beneath the MAV.
Search WWH ::




Custom Search