Image Processing Reference

In-Depth Information

model as before and introducing the term

N
f

j
=
2

R
j

R
j
−
1

2

F

λ
r

−

(6.8)

in the objective function of Eq.
6.2
. As before,
λ
r
is the weight that controls the relative influence of

the terms in the objective function. A similar regularizer can also be added for the translation when

it is optimized.

In
Olsen and Bartoli
[
2008
], a single term was introduced to subsume both shape and camera

temporal consistency regularizers by noting that their respective parameters appear simultaneously

in
C
. This yields a regularizer of the form

N
f

j
=
2

2

λ
m

C
2
j
−
1
:
2
j
−

C
2
j
−
3
:
2
j
−
2

F
,

(6.9)

where, as before,
C
2
j
−
1
:
2
j

is the 2
×
3
N
s

matrix containing the two rows of
C
corresponding to

frame
j
.

Recently, it was proposed to exploit a very different kind of temporal informa-

tion
Rabaud and Belongie
[
2008
,
2009
],
Zhu
et al.
[
2010
]. Instead of assuming that frame-to-frame

motion is small, these methods rely on the concept of repetitions and assume that, given a sufficiently

long video sequence, similar shapes will appear several times, but seen from different viewpoints.

Under this assumption, several frames picturing the same shape up to a rigid transformation can be

used together to estimate the 3D shape.

In
Rabaud and Belongie
[
2008
],
Zhu
et al.
[
2010
], the images were clustered based on a re-

projection error criterion. Given a pair of images, epipolar geometry can be used to decide whether

both images were generated by the same rigid object. Unfortunately, some cases remain ambigu-

ous, and therefore triplets of images need to be compared. Once the image clusters in which

the shape moves rigidly have been found, a standard rigid structure from motion technique, such

as
Tomasi and Kanade
[
1992
], can be applied to reconstruct the shape in each cluster. To further

improve the global reconstruction in the whole sequence, and account for temporally continuous

deformations rather than piecewise rigid ones, an additional refinement step is performed. The major

difference between
Rabaud and Belongie
[
2008
] and
Zhu
et al.
[
2010
] arises from the fact that the

former uses independent clusters of at least 3 frames, whereas the latter looks for as large as possible

overlapping groups of images.

In
Rabaud and Belongie
[
2009
], a different method to account for these repetitions was

proposed. Instead of using reprojection errors, a measure of similarity between triplets of shapes

{

Q
i
,
Q
j
,
Q
k

}

was introduced. It can be written as

2

Q
i

Q
j

Q
k

+

+

Q
h

a
F
(i,j,k)
=

−

.

(6.10)

3

F

h
∈{
i,j,k
}

Search WWH ::

Custom Search