Digital Signal Processing Reference
In-Depth Information
3.5
Object Segmentation in Videos
A video is composed of a sequence of images. Different from still image
segmentation, video segmentation should take account the temporal information.
Many statistical models have been proposed for video segmentation, either gener-
ative or discriminative. In the discriminative model, a large number of expensive
labeled data is required to train an excellent classifier. On the contrary, the genera-
tive model can handle the incomplete data problem and address the large number of
unlabeled data via small number of expensive labeled data. Therefore, the genera-
tive model is popular for video segmentation. On the other hand, the discriminative
model relaxes the conditional independence assumption and has better predictive
performance than the generative model. This attracts many attentions to the discrim-
inative model in video segmentation. MRFs [ 62 , 63 ] and CRFs [ 64 - 67 ] are repre-
sentative generative and discriminative models in video segmentation, respectively.
Let X
= {
x i
}
S and Z
= {
z i
}
S be the observation and labels of a video, where
i
i
S
is the set of units (they can be pixels, patches, or semantic regions) in the
video. Then video segmentation is to maximize the posterior p
= {
s i }
(
Z
|
X
)
.
3.5.1
MRF Model
In the MRF model, the posterior is expressed proportioned to the joint probability
using the Baye's rule as:
p
(
Z
|
X
)
p
(
Z
|
X
)=
p
(
X
|
Z
)
p
(
Z
) ,
(3.14)
where the prior p
is modeled as a MRF.
In the MRF model, the strong assumption of conditional independency of the
observed data is enforced. Therefore, the likelihood p
(
Z
)
(
X
|
Z
)
isassumedtohavea
factorized form, i.e.,
)= s i S p ( x i | z i ) .
(
|
p
X
Z
(3.15)
Here p
indicates the probability that the unit s i has the label z i based on the
observation x i at s i .Here x i can be features incorporating the color, texture, and
motion information. To adapt to changes of environment, some features robust to
illumination changes are utilized, like gradient direction, shadow models, and color
co-occurrence.
To model the distribution of p
(
x i |
z i )
, several ways have been proposed. The most
traditional approach is model the distribution in terms of the Gaussian Mixture
Models (GMMs) and the Expectation Maximization (EM) algorithm is used to
estimate the model parameters. The GMM model has several shortcomings: it is
sensitive to the initialization, the EM algorithm takes long time to converge, and a
suitable number of Gaussian components have to be set. To address these problems,
(
x i
|
z i
)
Search WWH ::




Custom Search