Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

3.5

Object Segmentation in Videos

A video is composed of a sequence of images. Different from still image

segmentation, video segmentation should take account the temporal information.

Many statistical models have been proposed for video segmentation, either gener-

ative or discriminative. In the discriminative model, a large number of expensive

labeled data is required to train an excellent classifier. On the contrary, the genera-

tive model can handle the incomplete data problem and address the large number of

unlabeled data via small number of expensive labeled data. Therefore, the genera-

tive model is popular for video segmentation. On the other hand, the discriminative

model relaxes the conditional independence assumption and has better predictive

performance than the generative model. This attracts many attentions to the discrim-

inative model in video segmentation. MRFs [ 62 , 63 ] and CRFs [ 64 - 67 ] are repre-

sentative generative and discriminative models in video segmentation, respectively.

Let X

= {

x i

}

S and Z

= {

z i

}

S be the observation and labels of a video, where

∈

is the set of units (they can be pixels, patches, or semantic regions) in the

video. Then video segmentation is to maximize the posterior p

= {

s i }

(

)

3.5.1

MRF Model

In the MRF model, the posterior is expressed proportioned to the joint probability

using the Baye's rule as:

(

) ∝

(

)

(

) ,

(3.14)

where the prior p

is modeled as a MRF.

In the MRF model, the strong assumption of conditional independency of the

observed data is enforced. Therefore, the likelihood p

(

)

(

)

isassumedtohavea

factorized form, i.e.,

)= s i ∈ S p ( x i | z i ) .

(

(3.15)

Here p

indicates the probability that the unit s i has the label z i based on the

observation x i at s i .Here x i can be features incorporating the color, texture, and

motion information. To adapt to changes of environment, some features robust to

illumination changes are utilized, like gradient direction, shadow models, and color

co-occurrence.

To model the distribution of p

(

x i |

z i )

, several ways have been proposed. The most

traditional approach is model the distribution in terms of the Gaussian Mixture

Models (GMMs) and the Expectation Maximization (EM) algorithm is used to

estimate the model parameters. The GMM model has several shortcomings: it is

sensitive to the initialization, the EM algorithm takes long time to converge, and a

suitable number of Gaussian components have to be set. To address these problems,

(

x i

z i

)

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home