Graphics Reference
In-Depth Information
only retain Harris corners whose quality measure is larger than that of all the points
in their
N
×
N
pixel neighborhood for some user-selected
N
.
4.1.1.1 Implementation Details
For a digital image, Harris and Stephens proposed to approximate the gradients in
Equation (
4.3
) with
∂
I
∂
I
x
(
x
,
y
)
=
I
(
x
+
1,
y
)
−
I
(
x
−
1,
y
)
y
(
x
,
y
)
=
I
(
x
,
y
+
1
)
−
I
(
x
,
y
−
1
)
(4.5)
∂
∂
An alternative that ties in more closely with techniques discussed in the rest of
the chapter is to approximate the gradients by convolving them with the derivatives
of a Gaussian function:
∂
)
∗
∂
(
σ
D
)
∂
)
∗
∂
(
σ
D
)
I
G
x
,
y
,
I
G
x
,
y
,
x
(
x
,
y
)
=
I
(
x
,
y
y
(
x
,
y
)
=
I
(
x
,
y
(4.6)
∂
∂
x
∂
∂
y
where
∗
indicates convolution, and
2
exp
1
1
x
2
y
2
G
(
x
,
y
,
σ)
=
−
2
(
+
)
(4.7)
2
πσ
2
σ
That is, we smooth the image to remove high frequencies before taking the deriva-
tive. Also, to make the response as a function of window location smoother, we
can replace the binary function
w
in Equation (
4.3
) with a radially symmet-
ric function that weights pixels in the center of the window more strongly, such as a
Gaussian:
(
x
,
y
)
exp
1
1
2
2
w
(
x
,
y
)
=
−
I
((
x
−
x
0
)
+
(
y
−
y
0
)
)
(4.8)
I
2
πσ
2
σ
=
G
(
x
−
x
0
,
y
−
y
0
,
σ
)
(4.9)
I
where
is the pixel at the center of the block.
There are two Gaussian functions at work here. The first Gaussian,
(
x
0
,
y
0
)
in
Equation (
4.6
), uses a
derivation scale
D
that specifies the domain over which the
image is differentiated to compute the
x
and
y
gradients and the amount of smooth-
ing that is applied. The second Gaussian, in Equation (
4.8
), uses an
integration scale
σ
σ
I
that specifies the domain over which the image is integrated to determine which
pixels form the “window.” These two parameters play a major role in scale-space
feature detection, discussed in the next section. For convenience,
σ
I
is usually taken
to be a fixed multiple of
σ
D
, with the scaling factor in the range
[
1.0, 2.0
]
[
326
].
4.1.1.2 Good Features to Track
Shi, Tomasi, and Kanade [
492
,
442
] observed that the same matrix in Equation (
4.3
)
naturally results from investigating the properties of blocks of pixels that are good for
tracking. That is, we consider a sequence of images obtained by a video camera and
indexed by time,
I
(
x
,
y
,
t
)
. We hypothesize that a given block of pixels at time
t
+
1is
actually some block of pixels
W
at time
t
translated by a vector
(
u
,
v
)
. That is,
(
+
+
+
)
=
(
)
∀
(
)
∈
W
I
x
u
,
y
v
,
t
1
I
x
,
y
,
t
x
,
y
(4.10)