Features and Matching - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

to both geometry and intensity), as in two frames of video separated by a fraction of a

second, this approach works reasonably well, and is indeed the basis for some of the

earliest video tracking algorithms (e.g., [ 307 , 442 ]).

However, we can see that this approach is unsuitable for description andmatching

of features in a general image pair. Problems arise when the images have significantly

different rotations, scales, illuminations, and perspective changes, as illustrated in

Figure 4.11 . This sectiondiscusses descriptor designs designed tobe roughly invariant

to these transformations.

4.2.1

Support Regions

The first problemis determining the support regionof pixels that should contribute to

a feature's descriptor. We exploit the information produced by the feature detectors

discussed previously. For example, scale-invariant features such as Harris-Laplace,

Hessian-Laplace, and DoG are already detected at a characteristic scale. Therefore,

we can draw a circle around each feature whose radius is the characteristic scale,

and these circles will ideally be scale-covariant (as illustrated in Figure 4.5 ). Features

such as MSERs or Hessian-Affine automatically produce affine-covariant regions (as

illustrated in Figure 4.12 ). In particular, we can use the circular region produced at

the end of the affine adaptation process in Section 4.1.5 as the basis for an affine-

invariant descriptor. Thus, we can immediately assume that any feature detected

in scale space can be associated with a scale-covariant circle or an affine-covariant

ellipse. Any uniform scaling of this circle or ellipse is also covariant; we typically use

a larger neighborhood of pixels to build the descriptor than just this neighborhood

(see Figure 4.16 d).

Many descriptor algorithms assume a square patch is given around the feature

location as opposed to a circular one. This means we need to assign a reliable orien-

tation to the feature to consistently define the “top” edge of the square. The easiest

approach is to estimate the dominant gradient orientation of a patch. Lowe [ 306 ]

suggested estimating this orientation based on a histogram of pixel gradient orien-

tations over the support region of the scale-covariant circle. Concretely, for each

pixel

(

x , y

)

in the support region, we estimate the gradient magnitude M

(

x , y

)

and

orientation

of the gradient orientations, where the

angles are quantized (for example, using thirty-six bins, one for each ten degrees).

For each pixel

θ(

x , y

)

. We create a histogram h

(θ)

(

x , y

)

, we increment the bin corresponding to

θ(

x , y

)

by the quantity

(

x , y

)

(

−

x 0 , y

−

y 0 , 1.5

σ)

, where G is a Gaussian function,

(

x 0 , y 0

)

is the center of

the support region, and

is the scale of the feature. That is, each pixel contributes in

proportion to the strength of its gradient and its proximity to the center of the sup-

port region. After forming the orientation histogram, we detect its peak and use the

maximizer

θ ∗ as the dominant gradient orientation. If the histogram contains more

than one large peak, multiple features can be generated at the same location and

scale with different orientations. Figure 4.16 illustrates the process. An alternative,

as discussed in Section 4.2.4 , is to build a descriptor that is itself rotation-invariant,

instead of explicitly estimating the orientation of the feature and rotating the support

region.

After determining a scale- and rotation-normalized patch around a feature loca-

tion, we resample it to have a uniform number of pixels (for example, 41

41 pixels

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home