Integrated Frameworks for Three-Dimensional Scene Reconstruction - 3D Computer Vision: Efficient Methods and Applications

Graphics Reference

In-Depth Information

with the absolute depth data of depth from defocus. A work related to the presented

approach has been published by Myles and da Vitoria Lobo ( 1998 ), where a method

to recover affine motion and defocus simultaneously is proposed. However, the spa-

tial extent of the scene is not reconstructed by their method, since it requires planar

objects. In contrast, the method described in this section yields a three-dimensional

scene reconstruction at absolute scale based on an image sequence acquired with a

monocular camera.

5.1.1 Combining Motion, Structure, and Defocus

The structure from motion analysis employed in this section involves the extrac-

tion of salient features from the image sequence which are tracked using the KLT

technique (Lucas and Kanade, 1981 ; Shi and Tomasi, 1994 ). A depth from defocus

analysis is performed for these features according to the method introduced by Kuhl

et al. ( 2006 ), as described in detail in Sect. 4.2.3 . We found experimentally that the

random scatter of the feature positions extracted by the KLT tracker is largely in-

dependent of the image blur for PSF radii smaller than 5 pixels and is always of

the order of 0 . 1 pixel. However, more features are detected and fewer features are

lost by the tracker when the tracking procedure is started on a well-focused image.

Hence, the tracking procedure is repeated, starting from the sharpest image located

near the middle of the sequence which displays the largest value of H according

to ( 4.17 ) averaged over all previously detected features, proceeding towards either

end of the sequence and using the regions of interest (ROIs) extracted from this

image as reference patterns. The three-dimensional coordinates W x k of the scene

points are then computed by extending the classical bundle adjustment error term

( 1.25 ) with an additional error term that takes into account the depth from defocus

measurements, leading to the combined error term

I i T − 1 Q C W T,

c j } i , W x k −

T − 1 S i x k

S i

I i

E comb =

{

α S C i

W T W x k z −

σ ik 2 .

(5.1)

The error term E comb is minimised with respect to the L camera transforms C W T and

the K scene points W x k .Thevalueof σ ik corresponds to the estimated PSF radius

for feature k in image i , α is a weighting factor,

the depth-defocus function

that yields the expected PSF radius of feature k in image i , and

C i

W T W x k ] z the z

coordinate (depth) of a scene point. The estimated radii σ ik of the Gaussian PSFs

define a regularisation term in ( 5.1 ), such that absolutely scaled three-dimensional

coordinates W x k of the scene points are obtained. The values of W x k are initialised

according to the depth values estimated based on the depth from defocus approach.

To minimise the error term E comb the Levenberg-Marquardt algorithm (Press et al.,

2007 ) is used, and to reduce the effect of outliers, the M-estimator technique (Rey,

[

3D Computer Vision: Efficient Methods and Applications

Search WWH ::

Custom Search

Home