Image Processing Reference
In-Depth Information
model as before and introducing the term
N f
j = 2
R j
R j 1
2
F
λ r
(6.8)
in the objective function of Eq. 6.2 . As before, λ r is the weight that controls the relative influence of
the terms in the objective function. A similar regularizer can also be added for the translation when
it is optimized.
In Olsen and Bartoli [ 2008 ], a single term was introduced to subsume both shape and camera
temporal consistency regularizers by noting that their respective parameters appear simultaneously
in C . This yields a regularizer of the form
N f
j = 2
2
λ m
C 2 j 1 : 2 j
C 2 j 3 : 2 j 2
F ,
(6.9)
where, as before, C 2 j 1 : 2 j
is the 2 × 3 N s
matrix containing the two rows of C corresponding to
frame j .
Recently, it was proposed to exploit a very different kind of temporal informa-
tion Rabaud and Belongie [ 2008 , 2009 ], Zhu et al. [ 2010 ]. Instead of assuming that frame-to-frame
motion is small, these methods rely on the concept of repetitions and assume that, given a sufficiently
long video sequence, similar shapes will appear several times, but seen from different viewpoints.
Under this assumption, several frames picturing the same shape up to a rigid transformation can be
used together to estimate the 3D shape.
In Rabaud and Belongie [ 2008 ], Zhu et al. [ 2010 ], the images were clustered based on a re-
projection error criterion. Given a pair of images, epipolar geometry can be used to decide whether
both images were generated by the same rigid object. Unfortunately, some cases remain ambigu-
ous, and therefore triplets of images need to be compared. Once the image clusters in which
the shape moves rigidly have been found, a standard rigid structure from motion technique, such
as Tomasi and Kanade [ 1992 ], can be applied to reconstruct the shape in each cluster. To further
improve the global reconstruction in the whole sequence, and account for temporally continuous
deformations rather than piecewise rigid ones, an additional refinement step is performed. The major
difference between Rabaud and Belongie [ 2008 ] and Zhu et al. [ 2010 ] arises from the fact that the
former uses independent clusters of at least 3 frames, whereas the latter looks for as large as possible
overlapping groups of images.
In Rabaud and Belongie [ 2009 ], a different method to account for these repetitions was
proposed. Instead of using reprojection errors, a measure of similarity between triplets of shapes
{
Q i , Q j , Q k
}
was introduced. It can be written as
2
Q i
Q j
Q k
+
+
Q h
a F (i,j,k) =
.
(6.10)
3
F
h ∈{ i,j,k }
Search WWH ::




Custom Search