Three-Dimensional Pose Estimation and Segmentation Methods - 3D Computer Vision: Efficient Methods and Applications

Graphics Reference

In-Depth Information

2.3.4 Object Detection and Tracking in Point Clouds

This section describes the approach to the detection and tracking of objects in three-

dimensional point clouds suggested by Schmidt et al. ( 2007 ). The presentation is

adopted from that work. The method relies on a motion-attributed point cloud ob-

tained with the spacetime stereo approach described in Sect. 1.5.2.5 . In a subsequent

step, motion-attributed clusters are formed which are then used for generating and

tracking object hypotheses.

2.3.4.1 Motion-Attributed Point Cloud

A three-dimensional representation of the scene is generated with the correlation-

based stereo vision algorithm by Franke and Joos ( 2000 ) and with the spacetime

stereo algorithm described by Schmidt et al. ( 2007 ) (cf. Sect. 1.5.2.5 ). Both stereo

techniques generate three-dimensional points based on edges in the image, espe-

cially object boundaries. Due to the local approach they are independent of the ob-

ject appearance. While correlation-based stereo has the advantage of higher spatial

accuracy and is capable of generating more point correspondences, spacetime stereo

provides a velocity value for each stereo point. However, it generates a smaller

number of points and is spatially less accurate, since not all edges are necessarily

well described by the model defined in ( 1.118 ). Taking into account these proper-

ties of the algorithms, the results are merged into a single motion-attributed three-

dimensional point cloud. For each extracted three-dimensional point c k an average

velocity

( 1 ,...,J) in an el-

lipsoid neighbourhood defined by δ S (s j ,c k )< 1 around c k . To take into account

the spatial uncertainty in depth direction of the spacetime data, δ S (s j ,c k ) defines a

Mahalanobis distance whose correlation matrix Σ contains an entry Σ z =

v(c k ) is calculated, using all spacetime points s j , j

¯

∈

1forthe

depth coordinate which can be derived from the recorded data, leading to

J

ρ

J

v(c k ) =

v(s j )

∀ s j : δ S (s j ,c k )< 1 .

(2.36)

j

=

1

The factor ρ denotes the relative scaling of the velocities with respect to the spatial

coordinates. It is adapted empirically depending on the speed of the observed ob-

jects. This results in a four-dimensional point cloud, where each three-dimensional

point is attributed with an additional one-dimensional velocity component parallel

to the epipolar lines; see Fig. 2.20 d.

A reference image of the observed scene is used to reduce the amount of data

to be processed by masking out three-dimensional points that emerge from static

parts of the scene, as shown in Figs. 2.20 a and b. Furthermore, only points within a

given interval above the ground plane are used, as we intend to localise objects and

humans and thus always assume a maximum height for objects above the ground.

3D Computer Vision: Efficient Methods and Applications

Search WWH ::

Custom Search

Home