Features and Matching - Computer Vision for Visual Effects

Graphics Reference

In-Depth Information

imaging conditions, and should be taken into account when determining if a detec-

tion is repeated. Figure 4.21 b illustrates this more stringent test. A detection is

considered repeated if the area of intersection of the two regions is sufficiently large

compared to the area of their union (e.g., above sixty percent).

Mikolajczyk et al. [ 329 ] surveyed the affine-covariant feature detectors discussed

in Section 4.1 , and tested themwith respect to viewpoint and scale change, blurring,

JPEG compression, and illumination changes on a varied set of images. Their gen-

eral conclusions were that the Hessian-Affine and MSER detectors had the highest

repeatability under the various conditions, followed by the Harris-Affine detector.

In general, Hessian-Affine and Harris-Affine produced a larger number of detected

pairs than the other algorithms. They then used the SIFT descriptor as the basis for

matching features from each detector, computing a matching score as

M

MS

=

(4.42)

min

(

N 1 , N 2

)

where M is the number of correct nearest-neighbor matches computed using

Euclidean distance between the descriptors. They generally concluded that the

Hessian-Affine and Harris-Affine detectors produced a large number of matches (but

with a relatively high false alarm rate), while MSER produced a lower number of

matches (but with a low false alarm rate).

Mikolajczyk and Schmid [ 328 ] followed up with a more comprehensive evalua-

tion of feature descriptors, considering combinations of the Harris-Laplace/Affine

and Hessian-Laplace/Affine detectors with most of the descriptors discussed in

Section 4.2 . They investigated the same changes in imaging conditions, comput-

ing the precision and recall of each detector/descriptor combination as functions of

a changing parameter (e.g., the rotation angle between the images). Here, precision

and recall are defined as

# correct matches

# total matches

# correct matches

# true correspondences

=

precision

recall

(4.43)

where the correct matches and true correspondences are determined from the

repeatability score and region overlapmeasure defined previously. A good descriptor

should have high precision — that is, few false matches — and high recall — that is,

few matches that are present in the detector results but poorly represented by the

descriptor. Their general conclusions, independent of the detector used, were that

the GLOH and SIFT descriptors had the best performance. Shape contexts and PCA-

SIFT also performed well. This study also confirmed the usefulness of the nearest

neighbor distance ratio for matching SIFT descriptors.

Moreels and Perona [ 336 ] undertook a similar controlled evaluation of detec-

tor/descriptor combinations, for the specific problem of matching features in

close-up images of 3D objects with respect to viewpoint and lighting changes. They

found that Hessian-Affine and DoG detectors with SIFT descriptors had consis-

tently high performance for viewpoint changes. MSER and shape contexts, which

performed well on planar scenes in [ 328 ], were found to have only average perfor-

mance for matching 3D objects. The Harris-Affine detector with the SIFT descriptor

Computer Vision for Visual Effects

Search WWH ::

Custom Search

Home