A SYSTEM FOR VIDEO OBJECT SEGMENTATION - Video Object Extraction and Representation: Theory and Applications

Digital Signal Processing Reference

In-Depth Information

A System for Video Object Segmentation

a fitness

metric

for

the

surface

representation

of a

video object

and

formulate the video object segmentation problem as follows:

arg S min (E vobject

( S , I , M ))

(4..1)

where

E vobject

an energy

function,

the spatio-temporal

surface

that represents the video object,

is the video sequence, and

is the

priori

knowledge of the object.

This energy function has a lower energy

when the surface is consistent

with

and

At the highest level,

we split the information into two basic classes:

internal

and

external

energies,

E internal

and

E external ,

respectively.

E vobject

( S , I , M ) =E internal

( S , M ) +E external

( S , I )

(4.2)

External energies are deductive, i.e., the video object membership is

deduced from visual artifacts. Internal energies are inductive, i.e., an

assumption about the video object is made and the surface tries to fit

the assumption. As shown in Eq. 4.2, these components differ in their

functional arguments: internal energies are only dependent upon the

surface and model descriptions; external energies are dependent on the

surface and the video sequence. Internal energies counterbalances the

external energy's tendency to fit (and overfit) the visual artifacts in the

video sequence.

External and internal energies can be further classified by their time

dependence

and

applicability,

respectively.

External

energies

can

split

into

two

components,

E image

and

E motion ,

based

upon

whether

they use temporal correlation in their analysis.

E image

does its analysis

piecemeal, i.e., as if each frame is in isolation, while

E motion

treats the

video sequence as a whole in its analysis.

∫

E image ( S ( * , t' ) ,

E external ( S, I )

( x, y, t )| t=t' ) dt' +

E motion ( S, I )

(4.3)

where

I ( x, y, t )| t=t '

is the frame

of the video sequence.

Internal en-

ergies are split by their dependence upon

a priori

information:

E world ,

that

are

based

upon environmental

assumptions

and

are

valid

for all

video objects, and

E object ,

that are based upon an assumption of object

class:

E internal ( S, M ) =E world ( S ) +E object ( S,M ) (4.4)

While environmental assumptions do not require any information about

the

object,

their

generality

usually

limits

their

power.

Object-based

Video Object Extraction and Representation: Theory and Applications

Search WWH ::

Custom Search

Home