Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

document. These segments captured the spatial relationships among visual words.

Some good segments are sifted from bad ones for each discovered object class.

Verbeek et al. [ 52 ] proposed two aspect-based spatial field models by combin-

ing pLSA/LDA with Markov Random Fields (MRF). One is based on averaging

over forests of minimal spanning trees linking neighboring image regions. A tree-

structure prior is imposed to the object class labels Z j = {

z ji }

of image patches in

image j ,

exp

∑ i ψ ( z ji , z j χ ( i ) )+ log θ j

(

Z j ) ∝

(3.11)

where

χ (

)

is the unique parent of patch i in the tree, and

ψ (

z ji ,

z j χ ( i ) )

is a pair-wise

potential,

ψ (

z ji ,

z j χ ( i ) )= ρ

[

z ji =

z j χ ( i ) ] .

(3.12)

The other model applies an efficient chain-based Expectation Propagation

method for regular 8-neighbor Markov Random Fields. The prior over Z j is given by

exp

i ∼ i ψ ( z ji , z ji )+ log θ j

(

Z j ) ∝

(3.13)

i enumerates spatial neighbor patches i , i in image j . MRF captures the

local spatial dependence of image patches. These two models were trained using

either patch-level labels or image-level labels. Tested on 240 images of nine object

categories from the MSRC data set, when trained using patch-level labels, they

achieved object segmentation accuracy of 80

where i

∼

2% and when trained using image-

level labels, the accuracy of 78

1% was achieved. The accuracies of pLSA were

0% respectively under these two settings. The similar idea was also

explored in [ 58 ] and a Dirichlet process mixture was introduced to automatically

learn the number of object classes from data. This framework was extended to

Conditional Random Field (CRF) [ 4 ] to integrated both local and global features in

the images [ 53 , 59 ].

Sudderth et al. [ 60 ] proposed a Transformed Dirichlet Process (TDP) model

to jointly solve the problem of scene classification and object segmentation. This

approach coupled topic models with spatial transformations and consistently ac-

counted for geometric constraints. The spatial relationships of different parts of

objects were explicitly modeled under a hierarchical Bayesian model. Cao et al.

[ 61 ] proposed a Spatially Coherent Latent Topic Model (Spatial-LTM) to simulta-

neously classify scene categories and segment objects. It oversegmented images into

regions of coherent latent topic model and coherent latent topic model was consid-

ered as visual words. It enforced the spatial coherency of the model by requiring that

only one single latent-topic was assigned to the image patches within each region.

5% and 74

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home