A SYSTEM FOR VIDEO OBJECT SEGMENTATION - Video Object Extraction and Representation: Theory and Applications

Digital Signal Processing Reference

In-Depth Information

As shown in Figure 4.21, this sequence is problematic because of diffi-

culty of motion estimation. Most of the moving regions are homogeneous

and confuse the motion estimation (an aperture problem). Thus, most

information comes from the visual edges. Also, the white flagpole oc-

cludes the container ship and splits the initial object estimates into two

objects. In the current system, this type of error is unrecoverable. In the

future, high-level scene understanding may avoid these problems. Ob-

jective results (defined in Section 4.23) for the container sequence are

shown in Figure 4.24.

RESULTS FROM HALL-MONITOR SEQUENCE

The hall monitor sequence (frames 60-69) has one person walking

down the hall. The camera and background are stationary and the

person is walking away from the camera.

As shown in Figure 4.22, this sequence is the most troublesome for

our system. Since the figure is non-rigid, the bootstrap stage incorrectly

finds two differently moving regions for the single object and, in the cur-

rent system, we cannot recover from this problem. If we knew a priori

that the background was stationary, then we could rely upon change

detection more strongly. The adaptive or heuristic weighting of infor-

mation sources are in our future research. Most regions on the person

are homogeneous, giving little or no motion information. Unfortunately,

there are no ground truth sequences for the hall monitor sequence, so

no objective results could be calculated.

AN OBJECTIVE MEASURE OF

SEGMENTATION QUALITY

In recent years, a major difficulty in evaluating results of content-

based video processing is the lack of good objective measures.

We in-

→

troduce a simple

measure called the 2-D

quality vector,

Q ,

defined in

Eq. 4.31 below:

〉

〈 || P truth ||

∩

|| P truth

P result ||

|| P truth

P result ||

,

→

,

[0, 1] × [0, 1]

(4.31)

Q

=

|| P result ||

where P truth are the pixels in the ground truth segmentation and P result

are the pixels in the segmentation result. The components of the quality

vector are called, respectively, the content and coverage percentage.

As shown in Figure 4.23,

→

Q

allows us to easily interpret the results

from segmentation algorithms. An ideal segmentation algorithm would

give results in

→

Q of <

1,1

>

and any refinement to

the segmentation

Video Object Extraction and Representation: Theory and Applications

Search WWH ::

Custom Search

Home