A SYSTEM FOR VIDEO OBJECT SEGMENTATION - Video Object Extraction and Representation: Theory and Applications

Digital Signal Processing Reference

In-Depth Information

homogeneous movement represent a conservative core regions of objects

and bootstrap our surface optimization.

We use PCA to simplify our motion field [Kung et al., 1996] [Lin

et al., 1996]. As mentioned in Section 2.4, we introduced a simple con-

cept called motion paths that groups pixels by their correspondence of

projected motion. Assuming pixels from the same objects are moving

together, we can find the motion paths of different objects by the princi-

pal component of their projected motions. For example, for a 10 frame

sequence, we derive a 20 length vector made up of the 10 x-y compo-

nents of the projected motion for each motion path. After estimating the

correlation matrix from the vectors of all motion paths, PCA extracts

the first eigenvector from the correlation matrix of the dataset and the

magnitude of the first eigenvector in each motion path. Next, we spa-

tially cluster the PCA of our motion paths. Coupling our PCA with the

initial path positions, the motion paths project into 3-D feature vector

space. Since initial path positions are mutually exclusive, the projection

of the motion paths into this feature space form a functional surface in

3-D feature space. After smoothing the surface, the peaks and saddles

on this surface correspond to regions of homogeneous movement. Us-

ing estimated partial derivatives of this surface, we cluster these motion

paths by their peak / saddle regions and derive initial conservative surface

estimates.

BOOTSTRAP RESULTS

As shown in Figure 4.12, the bootstrap stage yields conservative esti-

mates for initialization of our video object segmentation system. When

our extraction of a motion field is accurate, the bootstrap stage finds

core regions of objects by their homogeneous motion well and robustly.

These regions become our conservative surface estimates, S initial , and

are suitable for surface initialization, multiple object separation and our

containment , locality, and Voronoi Ordered Space components of E object

However, since we use only motion in our object detection, we cannot

find objects that move with the background.

The results from the MPEG-4 test sequences show the advantages

and disadvantages of the bootstrap stage. For the coastguard sequence,

we see that our bootstrap does its job quite well: the three foreground

objects (large coastguard ship, smaller motorboat and the wake from the

motorboat) are found. In the container sequence, the bootstrap stage

detects the two objects (the small boat and the large container ship).

Unfortunately, due to the occlusion of the white pole, the container ship

is split into two distinct objects. In the current system, we cannot join

these two objects back together, showing the need for higher-level anal-

Search WWH ::

Custom Search

Home