Image/Video Segmentation: Current Status, Trends, and Challenges - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

backgrounds, which have been used in many works [ 55 - 58 ]. The second row of

Fig. 1.11 shows the ground truth masks for image pairs. Many approaches have been

proposed to address the co-segmentation problem in terms of different optimization

techniques, such as the L1 norm model [ 55 ], L2 norm model [ 56 ], and the “reward”

model [ 57 ].

Generally, the problem of co-segmentation can be formulated as an energy

optimization, which can be defined as:

x

=

arg min

x

Intra term +

E 1 (

)

x

E 2 (

f 1 ,

f 2 , )

,

(1.9)

Inter term

where x denotes the label set with the value

. f 1 and f 2 are the description

for the foreground or background respectively. E 1 is defined as intra penalty within

each image, which can be expressed by the MRF including the unary term and pair-

wise term. The constraint between images is imposed by the second term E 2 ,which

makes the foreground of each images similar with each other. However, the opti-

mization of energy function becomes an NP hard problem. In order to overcome

these problems, different optimization methods have been proposed for solving

this problem, such as trust region graph cut [ 55 ], quadratic pseudo boolean op-

timization [ 56 ], graph cut [ 57 ], and dual decomposition [ 59 ]. A brief review of

co-segmentation can be referred to [ 59 ].

Another type of object driven segmentation is the class-specific segmentation,

which is to extract the object of interest from the given images/video. An exam-

ple of such works can be found in [ 62 ], which segments human faces automatically.

This method proposed an effective segmentation system for cutting human faces out

from video sequences in realtime, which consists of three stages. First, a learning

based face detector is developed to rapidly identify human faces. To speed up the

detection process, a face rejection cascade is constructed to remove most of neg-

ative samples while retaining all the face samples. A coarse-to-fine segmentation

approach is then used to extract the faces based on a min-cut optimization. Finally,

in order to refine the object boundary, this method employed a matting algorithm to

estimate the alpha-matte based on an adaptive trimap generation method.

As a highly nonrigid object, human face holds a high degree of variability in

size, shape, color, and texture. This method developed a fast face detector shown

in Fig. 1.12 , which consists of skin color filtering, rejector cascade, and cascades of

boosted face classifier. The filter is used to clean up the non-skin regions in the color

image during face detection. The rejector is designed to remove most of the non-face

candidates while allowing high accuracy for face detection. The promising face-like

locations will be examined in the final boosted face classifier. Note that the real-time

segmentation system for human face can be easily extended to other applications.

For example, if the coarse segmentation is performed on the appropriately defined

body region, this work can be extended to solve more challenging “head-shoulder

segmentation” problem.

In addition, an early work of class-based segmentation method has been dis-

cussed in [ 60 ], which aims to capture the common characteristics from a stored

{

0

,

1

}

Video Segmentation and Its Applications

Search WWH ::

Custom Search

Home