Information Technology Reference
In-Depth Information
Local Feature Similarity.
The Computer Vision literature related to local features,
generally uses the notion of distance, rather than that of similarity. However in most
cases a similarity function
s
()
can be easily derived from a distance function
d
()
.For
both SIFT and SURF the Euclidean distance is typically used as measure of dissimilar-
ity between two features [14,5].
Let
d
(
p
1
,p
2
)
[0
,
1]
be the normalized distance between two local features
p
1
and
p
2
. We can define the similarity as:
∈
s
(
p
1
,p
2
)=1
−
d
(
p
1
,p
2
)
Obviously
0
≤
s
(
p
1
,p
2
)
≤
1
for any
p
1
and
p
2
.
Local Features Matching.
A useful aspect that is often used when dealing with local
features is the concept of local feature matching. In [14], a distance ratio matching
scheme was proposed that has also been adopted by [5] and many others. Let's consider
a local feature
p
x
belonging to an image
d
x
(i.e.
p
x
∈ d
x
) and an image
d
y
.First,the
point
p
y
∈ d
y
closest to
p
x
(in the remainder
NN
1
(
p
x
,d
y
)
) is selected as candidate
match. Then, the distance ratio
σ
(
p
x
,d
y
)
[0
,
1]
of closest to second-closest neighbors
of
p
x
in
d
y
is considered. The distance ratio is defined as:
∈
σ
(
p
x
,d
y
)=
d
(
p
x
,NN
1
(
p
x
,d
y
))
d
(
p
x
,NN
2
(
p
x
,d
y
))
Finally,
p
x
and
NN
1
(
p
x
,d
y
)
are considered matching if the distance ratio
σ
(
p
x
,d
y
)
is
smaller than a given threshold. Thus, a function of matching between
p
x
∈
d
x
and an
image
d
y
is defined as:
m
(
p
x
,d
y
)=
1
if
σ
(
p
x
,d
y
)
<c
0
otherwise
In [14],
c
=0
.
8
was proposed reporting that this threshold allows to eliminate 90% of
the false matches while discarding less than 5% of the correct matches. In Section 7 we
report an experimental evaluation of classification effectiveness varying
c
that confirms
the results obtained by Lowe. Please note, that this parameter will be used in defining
the image similarity measure used as a baseline and in one of our proposed local feature
based classifiers.
For Computer Vision applications, the distance ratio described above is used for se-
lecting good candidate matches. More sophisticated algorithms are then used to select
actual matches from the selected ones considering geometric information as scale, ori-
entation and coordinates of the interest points. In most of the cases a Hough transform
[3] is used to search for keys that agree upon a particular model pose. To avoid the prob-
lem of boundary effects in hashing, each match is hashed into the 2 closest bins giving a
total of 16 entries for each hypothesis in the hash table. This method has been proposed
for SIFT [14] and is very similar to the weak geometry consistency check used in [12].
Thus, we define the set
M
h
(
d
x
,d
y
)
as the matching points in the most populated
entry in the Hash table containing the Hough transform of the matches in
d
y
obtained
using the distance ratio criteria.