INTRODUCTION TO CONTENT-BASED VISUAL PROCESSING - Video Object Extraction and Representation: Theory and Applications

Digital Signal Processing Reference

In-Depth Information

tioned into volumes onto which each physical object projects. The value

of I ( x, y, t ) over each volume is the projection of one and only one object.

Deriving physical objects from the video sequence is the true aim of

computer vision and is a notoriously difficult problem. The representa-

tion of the physical object is the three dimensional representation that

matches the physical ground truth. From the video sequence, we must

not only differentiate between physical objects, but also consider how

these physical objects project onto the video space. However, what we

see is merely the projection of the physical world and much information

about the physical object is lost through the process of projection. For

instance, in Fig. 1.1b, any number of objects with a square face could

have created the same projection. If we consider only the projection of

the physical objects in the video space, this restriction vastly simplifies

the problem by NOT dealing two major issues in computer vision:

1.

Reconstruction of the 3-D physical structure

2. Descriptions of how 3-D physical structure projects onto video space,

e.g., camera and perspective models.

Although the 3-D physical object is the ground truth from which

the video sequence is derived, we settle for the video object, a simpler

intermediate representation defined below.

D EFINITION 1.1 (V IDEO O BJECT AND V IDEO O BJECT P LANE ) The

intensity function associated with a projection of one physical object is

called its Video Object. A Video Object Plane (VOP) is the subset of

the Video object that lies within a single frame.

Video Objects are convenient intermediate representations that avoid

the complexity of computer vision problems while fulfilling an important

functionality in content-based video processing. Although video object

extraction and representation may be considered only a preprocessing

step for the determination of a physical object, this preprocessing is still

difficult and our solutions require techniques and research from many

different areas of Electrical Engineering, Computer Science, Biology and

Psychology.

4. CONVERGENCE OF TECHNOLOGIES

The complex nature of content-based processing necessitates the con-

vergence of many fields due to the mixed nature of the problem (as shown

in Figure 1.2): high-level and low-level analysis, image and temporal

information extraction, represent at ion and extraction, system design,

algorithmic design, computational engines, and mathematics.

Search WWH ::

Custom Search

Home