Information Technology Reference
In-Depth Information
8.2
Theoretical Background
,
A homography is a mapping between two images of a planar scene
P
.Let
p
=(
u
v
)
ξ
∈
represent the pixel coordinates of a 3D point
P
as observed in the normalized
A
B
image plane of a pinhole camera. Let
(resp.
) denote projective coordinates
for the image plane of a camera
A
(resp.
B
), and
{
A
}
(resp.
{
B
}
) denote its frame of
reference. A (3
×
3) homography matrix
H
:
A →B
defines the following mapping:
p
B
=
w
(
H
p
A
),where
,
p
)=
(
h
11
u
+
h
12
v
+
h
13
)
/
(
h
31
u
+
h
32
v
+
h
33
)
w
(
H
,
.
(
h
21
u
+
h
22
v
+
h
23
)
/
(
h
31
u
+
h
32
v
+
h
33
)
The mapping is defined up to a scale factor. That is, for any scaling factor
μ
= 0,
p
B
=
w
(
p
A
)=
w
(
H
p
A
). The Lie group
SL
(3) is the set of real matrices
μ
H
,
,
3
×
3
SL
(3)=
. If we suppose that the camera continuously
observes the planar object, any homography can be represented by a homography
matrix
H
{
H
∈
R
det(
H
)=1
}
∈
SL
(3) such that
K
R
+
tn
d
K
−
1
H
= γ
(8.1)
where
K
is the upper triangular matrix containing the camera intrinsic parameters,
R
is the rotation matrix representing the orientation of
{
}
{
}
B
with respect to
A
,
t
is
{
}
{
}
the translation vector of coordinates of the origin of
B
expressed in
A
,
n
is the
normal to the planar surface
P
expressed in
{
A
}
,
d
is the orthogonal distance of the
origin of
{
A
}
to the planar surface, and
γ
is a scaling factor:
γ = det
R
+
tn
d
−
=
1 +
n
R
t
d
−
3
3
.
Correspondingly, knowing the camera intrinsic parameters matrix
K
, any full rank
3
3 matrix with unitary determinant can be decomposed according to (8.1) (see
[9] for a numerical decomposition and [18] for the analytical decomposition). Note
that there exist two possible solutions to the decomposition. The planar surface
P
is
parametrized by
×
n
ξ =
d
P
=
{
ξ
∈{
A
}|
}
For any two frames
{
A
}
and
{
B
}
whose origins lie on the same side of the planar
surface
P
then
n
Rt
> −
d
by construction and the determinant of the associated
homography det(
H
)=1.
The map
w
is a group action of
SL
(3) on
R
2
:
w
(
H
1
,
w
(
H
2
,
p
)) =
w
(
H
1
H
2
,
p
)
where
H
1
,
SL
(3). The geometrical meaning of this property is that
the 3D motion of the camera between views
H
2
and
H
1
H
2
∈
{
A
}
and
{
B
}
, followed by the 3D
Search WWH ::
Custom Search