Graphics Reference
In-Depth Information
to both geometry and intensity), as in two frames of video separated by a fraction of a
second, this approach works reasonably well, and is indeed the basis for some of the
earliest video tracking algorithms (e.g., [ 307 , 442 ]).
However, we can see that this approach is unsuitable for description andmatching
of features in a general image pair. Problems arise when the images have significantly
different rotations, scales, illuminations, and perspective changes, as illustrated in
Figure 4.11 . This sectiondiscusses descriptor designs designed tobe roughly invariant
to these transformations.
4.2.1
Support Regions
The first problemis determining the support regionof pixels that should contribute to
a feature's descriptor. We exploit the information produced by the feature detectors
discussed previously. For example, scale-invariant features such as Harris-Laplace,
Hessian-Laplace, and DoG are already detected at a characteristic scale. Therefore,
we can draw a circle around each feature whose radius is the characteristic scale,
and these circles will ideally be scale-covariant (as illustrated in Figure 4.5 ). Features
such as MSERs or Hessian-Affine automatically produce affine-covariant regions (as
illustrated in Figure 4.12 ). In particular, we can use the circular region produced at
the end of the affine adaptation process in Section 4.1.5 as the basis for an affine-
invariant descriptor. Thus, we can immediately assume that any feature detected
in scale space can be associated with a scale-covariant circle or an affine-covariant
ellipse. Any uniform scaling of this circle or ellipse is also covariant; we typically use
a larger neighborhood of pixels to build the descriptor than just this neighborhood
(see Figure 4.16 d).
Many descriptor algorithms assume a square patch is given around the feature
location as opposed to a circular one. This means we need to assign a reliable orien-
tation to the feature to consistently define the “top” edge of the square. The easiest
approach is to estimate the dominant gradient orientation of a patch. Lowe [ 306 ]
suggested estimating this orientation based on a histogram of pixel gradient orien-
tations over the support region of the scale-covariant circle. Concretely, for each
pixel
(
x , y
)
in the support region, we estimate the gradient magnitude M
(
x , y
)
and
orientation
of the gradient orientations, where the
angles are quantized (for example, using thirty-six bins, one for each ten degrees).
For each pixel
θ(
x , y
)
. We create a histogram h
(θ)
(
x , y
)
, we increment the bin corresponding to
θ(
x , y
)
by the quantity
M
(
x , y
)
G
(
x
x 0 , y
y 0 , 1.5
σ)
, where G is a Gaussian function,
(
x 0 , y 0
)
is the center of
the support region, and
is the scale of the feature. That is, each pixel contributes in
proportion to the strength of its gradient and its proximity to the center of the sup-
port region. After forming the orientation histogram, we detect its peak and use the
maximizer
σ
θ as the dominant gradient orientation. If the histogram contains more
than one large peak, multiple features can be generated at the same location and
scale with different orientations. Figure 4.16 illustrates the process. An alternative,
as discussed in Section 4.2.4 , is to build a descriptor that is itself rotation-invariant,
instead of explicitly estimating the orientation of the feature and rotating the support
region.
After determining a scale- and rotation-normalized patch around a feature loca-
tion, we resample it to have a uniform number of pixels (for example, 41
×
41 pixels
Search WWH ::




Custom Search