Features and Matching - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

estimate the epipolar geometry and reject feature matches that pass the above tests

but are inconsistent with the epipolar lines (e.g., [ 576 ]). We discuss epipolar geometry

in detail in Chapter 5 .

4.2.3

Histogram-Based Descriptors

Several of the most commonly-used descriptors can be characterized as histograms

of pixel intensities or gradient valuesmeasuredover a set of subregions superimposed

on the feature's support region. In particular, the SIFT descriptor proposed by Lowe

[ 306 ] has proven extremely popular. The input to the detector is a feature location

(

. We first create

an oriented square centered at the feature location so that the top edge corresponds

to the dominant orientation (Figure 4.17 a). Each side of the square is a multiple of

the feature's scale (e.g., 6

x , y

)

, its estimated scale

σ

, and its estimated dominant orientation

θ

).

We then rotate the square, resampling the image pixels and smoothing the intensi-

ties with the appropriate Gaussian for the feature's characteristic scale. The gradient

at each resampled pixel is estimated and weighted by a Gaussian centered at the

middle of the square with a standard deviation of half the square's width; the goal is

to emphasize gradients closer to the center of the square.

Next, we subdivide the square into a 4

σ

4 grid of smaller squares and create a

coarsely-quantized histogram of eight gradient orientations within each grid square

(Figure 4.17 b). To make the descriptor more robust to small misalignments, the gra-

dient at a given pixel contributes to multiple grid squares and multiple histogram

bins based on trilinear interpolation (see Problem 4.19). We collect the eight his-

togram values from each of the sixteen grid squares into a 128-dimensional vector.

The final descriptor is obtained by normalizing this vector to unit length, zeroing

out any extremely large values (e.g., greater than 0.2), and renormalizing to unit

×

(a)

(b)

Figure 4.17. Constructing the SIFT descriptor. (a) An original detected feature, with charac-

teristic scale and dominant orientation. The descriptor is computed from the pixels inside the

indicated square, where the small arrow indicates the top edge. (b) The rotated and resampled

square of pixels, with the 4 × 4 grid overlaid. The eight lines inside each square indicate the size

of the histogram bin for each corresponding orientation. The SIFT descriptor is the concatenation

of these 8 × 4 × 4 = 128 gradient magnitudes into a vector, which is then normalized.

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home