Triangulation-Based Approaches to Three-Dimensional Scene Reconstruction - 3D Computer Vision: Efficient Methods and Applications

Graphics Reference

In-Depth Information

stereo algorithm introduced by Schmidt et al. ( 2007 ), which is described in this sec-

tion, relies on the fit of a parametric model to the spatio-temporal neighbourhood

of each interest pixel in an image sequence. This method yields a cloud of three-

dimensional points carrying additional information about the motion properties of

the corresponding scene part. Hence, point correspondences may be resolved which

would remain ambiguous without taking into account the temporal domain, thus re-

ducing the rate of false correspondences. The additional motion cues may be used to

support optional subsequent processing steps dealing with three-dimensional scene

segmentation and object tracking (cf. Sect. 2.3 ).

As in most stereo vision approaches that establish correspondences between

small image regions, the first processing step of our algorithm consists of determin-

ing interest pixels in order to select the image regions for which three-dimensional

information is computed in a later step. The interest operator may e.g. consist of the

local grey value variance or an edge detector. In this case, interest pixels correspond

to image regions with small-scale intensity variations, implying the presence of im-

age structures such as object boundaries upon which a correspondence analysis can

be based. The presentation in this section is adopted from Schmidt et al. ( 2007 ).

The image sequence is defined in (uvtg) space, where u and v denote the pixel

coordinates, t the time coordinate, and g the pixel grey value. To the local spatio-

temporal neighbourhood of each interest pixel a parameterised function h( P ,u,v,t)

is adapted, where the vector P denotes the parameters of the function. The interest

operator preferentially extracts image regions along the boundaries of objects in the

scene.

Ideally, an object boundary is described by an abrupt intensity change. In real

images, however, one does not observe such discontinuities, since they are blurred

by the point spread function of the optical system. Therefore, the intensity change

at an object boundary is modelled by a 'soft' function of sigmoidal shape like the

hyperbolic tangent (cf. Sect. 1.4.8.2 ). Without loss of generality it will be assumed

here that the epipolar lines are parallel to the image rows. As the image regions

inside and outside the object are usually not of uniform intensity, the pixel grey

values around an interest pixel are modelled by a combined sigmoid-polynomial

approach:

h( P ,u,v,t) = p 1 (v, t) tanh p 2 (v, t)u + p 3 (v, t) + p 4 (v, t). (1.118)

The terms p 1 (v, t) , p 2 (v, t) , p 3 (v, t) , and p 4 (v, t) denote polynomials in v and t .

Here it is assumed that the stereo camera system is calibrated (Krüger et al., 2004 )

and the stereo image pairs are rectified to standard geometry (cf. Sect. 1.5 ).

The polynomial p 1 (v, t) describes the amplitude and p 2 (v, t) the steepness of

the sigmoid function, which both depend on the image row v , while p 3 (v, t) ac-

counts for the row-dependent position of the model boundary. The value of p 2 (v, t)

is closely related to the sign of the intensity gradient and to how well it is focused,

where large values describe sharp edges and small values blurred edges. The poly-

nomial p 4 (v, t) is a spatially variable offset which models local intensity variations

across the object and in the background, e.g. allowing the model to adapt to a clut-

tered background. All described properties are assumed to be time-dependent. An

interest pixel is rejected if the residual of the fit exceeds a given threshold.

3D Computer Vision: Efficient Methods and Applications

Search WWH ::

Custom Search

Home