Graphics Reference
In-Depth Information
with the absolute depth data of depth from defocus. A work related to the presented
approach has been published by Myles and da Vitoria Lobo (
1998
), where a method
to recover affine motion and defocus simultaneously is proposed. However, the spa-
tial extent of the scene is not reconstructed by their method, since it requires planar
objects. In contrast, the method described in this section yields a three-dimensional
scene reconstruction at absolute scale based on an image sequence acquired with a
monocular camera.
5.1.1 Combining Motion, Structure, and Defocus
The structure from motion analysis employed in this section involves the extrac-
tion of salient features from the image sequence which are tracked using the KLT
technique (Lucas and Kanade,
1981
; Shi and Tomasi,
1994
). A depth from defocus
analysis is performed for these features according to the method introduced by Kuhl
et al. (
2006
), as described in detail in Sect.
4.2.3
. We found experimentally that the
random scatter of the feature positions extracted by the KLT tracker is largely in-
dependent of the image blur for PSF radii smaller than 5 pixels and is always of
the order of 0
.
1 pixel. However, more features are detected and fewer features are
lost by the tracker when the tracking procedure is started on a well-focused image.
Hence, the tracking procedure is repeated, starting from the sharpest image located
near the middle of the sequence which displays the largest value of
H
according
to (
4.17
) averaged over all previously detected features, proceeding towards either
end of the sequence and using the regions of interest (ROIs) extracted from this
image as reference patterns. The three-dimensional coordinates
W
x
k
of the scene
points are then computed by extending the classical bundle adjustment error term
(
1.25
) with an additional error term that takes into account the depth from defocus
measurements, leading to the combined error term
L
K
I
i
T
−
1
Q
C
W
T,
c
j
}
i
,
W
x
k
−
T
−
1
S
i
x
k
S
i
2
S
i
I
i
E
comb
=
{
i
=
1
k
=
1
α
S
C
i
W
T
W
x
k
z
−
σ
ik
2
.
+
(5.1)
The error term
E
comb
is minimised with respect to the
L
camera transforms
C
W
T
and
the
K
scene points
W
x
k
.Thevalueof
σ
ik
corresponds to the estimated PSF radius
for feature
k
in image
i
,
α
is a weighting factor,
the depth-defocus function
that yields the expected PSF radius of feature
k
in image
i
, and
S
C
i
W
T
W
x
k
]
z
the
z
coordinate (depth) of a scene point. The estimated radii
σ
ik
of the Gaussian PSFs
define a regularisation term in (
5.1
), such that absolutely scaled three-dimensional
coordinates
W
x
k
of the scene points are obtained. The values of
W
x
k
are initialised
according to the depth values estimated based on the depth from defocus approach.
To minimise the error term
E
comb
the Levenberg-Marquardt algorithm (Press et al.,
2007
) is used, and to reduce the effect of outliers, the M-estimator technique (Rey,
[
Search WWH ::
Custom Search