Graphics Reference
In-Depth Information
with the absolute depth data of depth from defocus. A work related to the presented
approach has been published by Myles and da Vitoria Lobo ( 1998 ), where a method
to recover affine motion and defocus simultaneously is proposed. However, the spa-
tial extent of the scene is not reconstructed by their method, since it requires planar
objects. In contrast, the method described in this section yields a three-dimensional
scene reconstruction at absolute scale based on an image sequence acquired with a
monocular camera.
5.1.1 Combining Motion, Structure, and Defocus
The structure from motion analysis employed in this section involves the extrac-
tion of salient features from the image sequence which are tracked using the KLT
technique (Lucas and Kanade, 1981 ; Shi and Tomasi, 1994 ). A depth from defocus
analysis is performed for these features according to the method introduced by Kuhl
et al. ( 2006 ), as described in detail in Sect. 4.2.3 . We found experimentally that the
random scatter of the feature positions extracted by the KLT tracker is largely in-
dependent of the image blur for PSF radii smaller than 5 pixels and is always of
the order of 0 . 1 pixel. However, more features are detected and fewer features are
lost by the tracker when the tracking procedure is started on a well-focused image.
Hence, the tracking procedure is repeated, starting from the sharpest image located
near the middle of the sequence which displays the largest value of H according
to ( 4.17 ) averaged over all previously detected features, proceeding towards either
end of the sequence and using the regions of interest (ROIs) extracted from this
image as reference patterns. The three-dimensional coordinates W x k of the scene
points are then computed by extending the classical bundle adjustment error term
( 1.25 ) with an additional error term that takes into account the depth from defocus
measurements, leading to the combined error term
L
K
I i T 1 Q C W T,
c j } i , W x k
T 1 S i x k
S i
2
S i
I i
E comb =
{
i
=
1
k
=
1
α S C i
W T W x k z
σ ik 2 .
+
(5.1)
The error term E comb is minimised with respect to the L camera transforms C W T and
the K scene points W x k .Thevalueof σ ik corresponds to the estimated PSF radius
for feature k in image i , α is a weighting factor,
the depth-defocus function
that yields the expected PSF radius of feature k in image i , and
S
C i
W T W x k ] z the z
coordinate (depth) of a scene point. The estimated radii σ ik of the Gaussian PSFs
define a regularisation term in ( 5.1 ), such that absolutely scaled three-dimensional
coordinates W x k of the scene points are obtained. The values of W x k are initialised
according to the depth values estimated based on the depth from defocus approach.
To minimise the error term E comb the Levenberg-Marquardt algorithm (Press et al.,
2007 ) is used, and to reduce the effect of outliers, the M-estimator technique (Rey,
[
Search WWH ::




Custom Search