Three-Dimensional Pose Estimation and Segmentation Methods - 3D Computer Vision: Efficient Methods and Applications

Graphics Reference

In-Depth Information

a viewer-centred representation of the image data. The views are generated automat-

ically from a three-dimensional object model by rendering, and the pose parameters

of each view are stored in a table. Edge templates are computed for each view.

For the input image, the best-fitting template and thus the corresponding pose pa-

rameters are determined by a template matching procedure. The difficult trade-off

between the tessellation constant, i.e. the difference between the pose parameters of

neighbouring views, and the accuracy of pose estimation is alleviated by a technique

for hierarchical template matching (Gavrila and Philomin, 1999 ).

The input image first undergoes an edge detection procedure. A distance trans-

form (DT) then converts the segmented binary edge image into what is called a

distance image. The distance image encodes the distance in the image plane of each

image point to its nearest edge point. If we denote the set of all points in the image

as A

S a 1 ,..., S a N }

S b 1 ,..., S b M }

and the set of all edge points as B

with

A , then the distance d( S a n ,B) for point S a n is given by

d S a n ,B =

⊆

mi m

S b m ,

S a n −

(2.1)

where

is a norm on the points of A and B (e.g. the Euclidean norm). For

numerical simplicity we use the chamfer-2-3 metric (Barrow, 1977 ) to approximate

the Euclidean metric.

The chamfer distance D C (T , B) between an edge template consisting of a set of

edge points T

...

S t 1 ,..., S t Q }

with T

⊆

A and the input edge image is given by

d S t n ,B .

D C (T , B)

(2.2)

A correspondence between a template and an image region is assumed to be present

once the distance measure ('dissimilarity') D(T,B) becomes smaller than a given

threshold value θ . To reduce false detections, the distance measure was extended to

include oriented edges (Gavrila and Philomin, 1999 ).

In order to recognise an object with unknown rotation and translation, a set of

transformed templates must be correlated with the distance image. Each template is

derived from a certain rotation of the three-dimensional object. In previous work, a

uniform tessellation often involved a difficult choice for the value of the tessellation

constant. If one chooses a relatively large value, the views that lie 'in between' grid

points on the viewing sphere are not properly represented in the regions where the

aspect graph is undergoing rapid changes. This decreases the accuracy of the mea-

sured pose angles. On the other hand, if one chooses a relatively small value for the

tessellation constant, this results in a large number of templates to be matched on-

line; matching all these templates sequentially is computationally intensive and pro-

hibitive to any real-time performance. The difficult trade-off regarding tessellation

constant is alleviated by a technique for hierarchical template matching, introduced

by Gavrila and Philomin ( 1999 ). That technique, designed for distance transform-

based matching, in an offline stage derives representation which takes into account

the structure of the given distribution of templates, i.e. their mutual degrees of sim-

ilarity. In the online stage, this approach allows an optimisation of the matching

3D Computer Vision: Efficient Methods and Applications

Search WWH ::

Custom Search

Home