Graphics Reference
In-Depth Information
If we have a strong appearancemodel for the performer—for example, amodel for
the expectedcolor of eachbodypart [ 447 ]—this informationcanalsobe incorporated
into p
. Shaheen et al. [ 438 ] compared the performance of marker-
less motion capture algorithms as the choices of image features and optimization
approaches were varied.
Instead of explicitly specifying a generative model from a pose to image features,
Agarwal and Triggs [ 3 ] used nonlinear regression on training data to directly predict
pose as a function of an image silhouette. Sigal et al. [ 450 ] used belief propagation on
a graphical model of body part relationships to estimate pose from an observation
likelihood model.
(
r
(
t
) | θ (
t
))
7.7.3
Backprojections and Visual Hulls
A cost function like Equation ( 7.37 ) operates entirely in the domain of the M camera
images. An alternate use of silhouettes is to project them into 3D space to constrain
the location of the solid model. As Figure 7.24 illustrates, the rim of each silhouette
back-projects into a region of 3D space. The edges of the correctly posed solid model
must be tangent to each of the back-projected silhouette regions (which we can
compute because the cameras are calibrated). That is, we can define D s
, S i
))
in Equation ( 7.37 ) as the sum of distances between the 3D ray through each point on
the i th silhouette to the closest point on the 3D body model (the short, thick lines in
Figure 7.24 a). Rosenhahn and Brox [ 397 ] and Gall et al. [ 161 ] described examples of
this approach when the kinematic model is parameterized by twists.
The intersection of the backprojected silhouette volumes in 3D space correspond-
ing to all of the available cameras is called the visual hull [ 263 ]. That is, the visual hull
consists of the voxels inside all of the silhouettes from each camera perspective. As
the number of cameras and the diversity of the set of perspectives grows, the visual
hull becomes a more accurate estimate of the 3D space occupied by the solid body.
The general concept is illustrated in Figure 7.25 , and Figure 7.26 shows a real example
involving an articulated human body. Generating the visual hull is also known as the
shape-from-silhouette problem. Similar techniques are discussed in Section 8.3.1 .
Generally, a visual hull approximation is fairly coarse (i.e., blocky) since subdivi-
sionof the capture volume into small voxels requires a substantial amount ofmemory.
Also, a large number of camerasmay be required to carve away voxels consistent with
all the silhouettes but nevertheless incorrect (e.g., the protrusions from the chest in
Figure 7.26 b). The visual hull is always an overestimate of the occupied space, and
cannot detect concavities.
Several markerless methods operate entirely in 3D by matching the solid body
model to the visual hull or by matching the respective surfaces. The observation
likelihood has the general form
(
S i
(
t
)
(
t
, V
p
(
r
(
t
) | θ (
t
))
exp
(
D v
(
V
(
t
)
(
t
)))
(7.38)
V
where V
(
t
)
is the visual hull at time t and
(
t
)
is the solid model correspond-
ing to pose
. D v is a distance function defined over sets of voxels in 3D. Mikic
et al. [ 324 ] described an early approach in which ellipsoids representing limbs were
fit to the visual hull and their centroids and endpoints were used to fit a kinematic
model parameterized with twists. Cheung et al. [ 91 ] described a similar hierarchical
θ (
t
)
Search WWH ::




Custom Search