Graphics Reference
In-Depth Information
have the same problems as hard segmentation in the presence of wispy or semi-
transparent foreground objects. Nonetheless, many video matting algorithms begin
with the extraction of temporally consistent, hard-edged foreground pieces in each
frame of video.
Generally, video matting algorithms depend on the optical flow estimated from
the image sequence, which is defined as the dense correspondence field correspond-
ing to the apparent motion of brightness patterns. That is, we compute a vector at
pixel
at time t of the video sequence that points at the apparent location of that
pixel at time t
(
x , y
)
1. This vector field can then be used to propagate thematte estimated
from time t to time t
+
1. Section 5.3 discusses the optical flow problem in detail.
Layered motion techniques represented an early approach to the video matting
problem. For example, Wang and Adelson [ 528 ] proposed to cluster the pixels of a
video sequence into multiple layers by fitting multiple affine motions to its optical
flow field, while Ayer and Sawhney [ 23 ] proposed an expectation-maximization algo-
rithm to estimate such affine motions based on the change in pixels' appearance and
a minimum-description-length formulation for finding the number of layers. Ke and
Kanade [ 234 ] observed that if the layers arise from planar patches in the scene, the
corresponding affine transformations lie in a low-dimensional subspace, which acts
as a strong constraint for robust layer extraction.
Several video matting methods are somewhat direct extensions of single-image
matting algorithms to video, incorporating a temporal consistency prior to produce
smoothly varying, non-jittery
+
mattes. For example, Chuang et al. [ 96 ] built upon
Bayesian matting by combining it with optical flow. That is, the trimap at time t is
estimated by “flowing” user-generated trimaps from keyframes on either side using
the estimated optical flow fields. The trimaps are modified to ensure the foreground
and background regions are reliable before being input to the standard Bayesian
matting algorithm. If the background is roughly planar, projective transformations
can be estimated as the camera moves to build a background mosaic that acts as
a clean plate, which significantly helps the speed and quality of pulling the matte.
Wexler et al. [ 544 ] and Apostoloff and Fitzgibbon [ 16 ] proposed a related Bayesian
approach, using a similar mosaicing method to obtain the background before esti-
mating the matte, and modeling the prior distribution for
α
with a beta distribution
as mentioned in Section 2.3.2 . They also incorporated a spatiotemporal consistency
prior on
α
and the original
image. The observation was similar to the basic assumption of Poisson matting: that
the matte gradient is roughly proportional to the image gradient.
Another family of approaches is based on extending the graph-cut methods of
Section 2.8 tohard foreground/background segmentation invideo. These approaches
can be viewed as methods for rotoscoping , or manually outlining contours of fore-
ground objects in each of many frames of film. Agarwala et al. [ 8 ] proposed a
well-known method for semi-automatic rotoscoping based on joint optimization
of contours over a full video sequence, using manually traced keyframes and incre-
mental user edits as hard constraints and image edges as soft constraints. While in
this work, contours were represented as splines, graph-cut algorithms would allow
the segmentation in each frame to bemuchmore detailed, that is, an arbitrary binary
matte. The human-assistedmotion annotation algorithm of Liu et al. [ 288 ] discussed
in Section 5.3.6 also can be viewed as an interactive rotoscoping tool.
α
, using learned relationships between the gradients of
α
Search WWH ::




Custom Search