Graphics Reference
In-Depth Information
stereo algorithm introduced by Schmidt et al. ( 2007 ), which is described in this sec-
tion, relies on the fit of a parametric model to the spatio-temporal neighbourhood
of each interest pixel in an image sequence. This method yields a cloud of three-
dimensional points carrying additional information about the motion properties of
the corresponding scene part. Hence, point correspondences may be resolved which
would remain ambiguous without taking into account the temporal domain, thus re-
ducing the rate of false correspondences. The additional motion cues may be used to
support optional subsequent processing steps dealing with three-dimensional scene
segmentation and object tracking (cf. Sect. 2.3 ).
As in most stereo vision approaches that establish correspondences between
small image regions, the first processing step of our algorithm consists of determin-
ing interest pixels in order to select the image regions for which three-dimensional
information is computed in a later step. The interest operator may e.g. consist of the
local grey value variance or an edge detector. In this case, interest pixels correspond
to image regions with small-scale intensity variations, implying the presence of im-
age structures such as object boundaries upon which a correspondence analysis can
be based. The presentation in this section is adopted from Schmidt et al. ( 2007 ).
The image sequence is defined in (uvtg) space, where u and v denote the pixel
coordinates, t the time coordinate, and g the pixel grey value. To the local spatio-
temporal neighbourhood of each interest pixel a parameterised function h( P ,u,v,t)
is adapted, where the vector P denotes the parameters of the function. The interest
operator preferentially extracts image regions along the boundaries of objects in the
scene.
Ideally, an object boundary is described by an abrupt intensity change. In real
images, however, one does not observe such discontinuities, since they are blurred
by the point spread function of the optical system. Therefore, the intensity change
at an object boundary is modelled by a 'soft' function of sigmoidal shape like the
hyperbolic tangent (cf. Sect. 1.4.8.2 ). Without loss of generality it will be assumed
here that the epipolar lines are parallel to the image rows. As the image regions
inside and outside the object are usually not of uniform intensity, the pixel grey
values around an interest pixel are modelled by a combined sigmoid-polynomial
approach:
h( P ,u,v,t) = p 1 (v, t) tanh p 2 (v, t)u + p 3 (v, t) + p 4 (v, t). (1.118)
The terms p 1 (v, t) , p 2 (v, t) , p 3 (v, t) , and p 4 (v, t) denote polynomials in v and t .
Here it is assumed that the stereo camera system is calibrated (Krüger et al., 2004 )
and the stereo image pairs are rectified to standard geometry (cf. Sect. 1.5 ).
The polynomial p 1 (v, t) describes the amplitude and p 2 (v, t) the steepness of
the sigmoid function, which both depend on the image row v , while p 3 (v, t) ac-
counts for the row-dependent position of the model boundary. The value of p 2 (v, t)
is closely related to the sign of the intensity gradient and to how well it is focused,
where large values describe sharp edges and small values blurred edges. The poly-
nomial p 4 (v, t) is a spatially variable offset which models local intensity variations
across the object and in the background, e.g. allowing the model to adapt to a clut-
tered background. All described properties are assumed to be time-dependent. An
interest pixel is rejected if the residual of the fit exceeds a given threshold.
Search WWH ::




Custom Search