Semantic Object Segmentation - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

Thus ( 3.2 )hasaclosedform,

x i , λ ) × r , a 1 + exp w a z r × b

exp u b Z .

θ ) ∝ ∏ i

(

X ;

P C (

z i |

(3.7)

θ = { λ ,{

are parameters. They are learned from a training by maximiz-

ing the conditional likelihood in [ 39 ]. Once the parameters are learned, the object

class labels are inferred by maximizing posterior marginals.

w a },{

u b }}

3.3.2.2

TextonBoost

Under the CRF framework, Shotton et al. [ 40 ] proposed TextonBoost to learn a

discriminative model of object classes incorporating texture, layout, and context

information. Their CRF includes four types of potentials: texture-layout, color, lo-

cation, and edge.

texture

−

layout

c olo r

loc ati on

, θ )= ∑ i

log P

(

ψ i (

z i ,

X ;

θ ψ )+

π (

c i ,

x i ;

θ π )+

(

z i ,

i ;

θ )

edge

+ ∑

(

ξ (

z i ,

z j ,

g ij (

)

;

θ ξ ) −

log C

( θ ,

) ,

(3.8)

) ∈ ε

where i and j are indices of pixels,

(

) ∈ ε

are two neighboring pixels,

θ =

{ θ ψ , θ π , θ , θ ξ }

is a normalization term.

The texture-layout potentials are provided by a boosting classifier combining

a set of discriminative features called texture-layout filters. The neighborhood of

pixel i is partitioned into regions by a predefined spatial kernel. Each texture-layout

v [ r , t ] (

are parameters, and C

( θ ,

)

is the number of pixels with texton t in region r . Therefore, texture-layout

filters are histograms of textons over defined spatial kernels. They capture texture,

spatial layout, and textural context. Discriminative texture-layout filters are selected

as weak classifiers and combined into a powerful classifier by Joint Boost [ 41 ]. Joint

Boost allows to share weak classifiers among different object classes and the learn

classifier has better generalization.

The color potentials model the color distribution of each object class using Gaus-

sian mixture models in CIELab color space.

The location potentials model the dependence between the locations of pixels

and object classes. For example, trees and sky tend to appear in the top regions of

images while roads tend to appear in the bottom regions of images.

In the edge potentials, g ij measures the edge features between neighbor pixels.

A penalty is added if two neighboring pixels have different object class labels unless

there is a strong edge between them.

TextonBoost was evaluated on 21 object classes from the MSRC database and

achieved 72

)

2% overall accuracy [ 40 ]. The confusion matrix is shown in Fig. 3.7 .

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home