Digital Signal Processing Reference
In-Depth Information
tioned into volumes onto which each physical object projects. The value
of I ( x, y, t ) over each volume is the projection of one and only one object.
Deriving physical objects from the video sequence is the true aim of
computer vision and is a notoriously difficult problem. The representa-
tion of the physical object is the three dimensional representation that
matches the physical ground truth. From the video sequence, we must
not only differentiate between physical objects, but also consider how
these physical objects project onto the video space. However, what we
see is merely the projection of the physical world and much information
about the physical object is lost through the process of projection. For
instance, in Fig. 1.1b, any number of objects with a square face could
have created the same projection. If we consider only the projection of
the physical objects in the video space, this restriction vastly simplifies
the problem by NOT dealing two major issues in computer vision:
1.
Reconstruction of the 3-D physical structure
2. Descriptions of how 3-D physical structure projects onto video space,
e.g., camera and perspective models.
Although the 3-D physical object is the ground truth from which
the video sequence is derived, we settle for the video object, a simpler
intermediate representation defined below.
D EFINITION 1.1 (V IDEO O BJECT AND V IDEO O BJECT P LANE ) The
intensity function associated with a projection of one physical object is
called its Video Object. A Video Object Plane (VOP) is the subset of
the Video object that lies within a single frame.
Video Objects are convenient intermediate representations that avoid
the complexity of computer vision problems while fulfilling an important
functionality in content-based video processing. Although video object
extraction and representation may be considered only a preprocessing
step for the determination of a physical object, this preprocessing is still
difficult and our solutions require techniques and research from many
different areas of Electrical Engineering, Computer Science, Biology and
Psychology.
4. CONVERGENCE OF TECHNOLOGIES
The complex nature of content-based processing necessitates the con-
vergence of many fields due to the mixed nature of the problem (as shown
in Figure 1.2): high-level and low-level analysis, image and temporal
information extraction, represent at ion and extraction, system design,
algorithmic design, computational engines, and mathematics.
 
Search WWH ::




Custom Search