Database Reference
In-Depth Information
of video are used to extract high-level descriptors via a support-vector machine
decision fusion. This increases the system's capability to retrieve videos via
concept-based queries. For example, the concepts “dancing” and “gun shooting”
are utilized for retrieving relevant video clips.
1.3.5.5
Event Detection and Video Classification
Video classification explores audio, textual, and visual information to classify
videos into categories according to semantics such as events. Examples of such
events are the “Touchdowns” and “Field goals” in American football. The video
classification method based on high-level concepts is presented in Chap. 7 .The
method classifies recurring events of the games without using any domain knowl-
edge, utilizing MPEG-7 standard descriptors. The specific events are “Run play”,
“Field goal” and “Pass play”. In addition, Chap. 9 presents the method for video
genre classification. This method employs domain-knowledge independent descrip-
tors, and an unsupervised clustering technique to identify video genres. Moreover,
a systematic scheme is employed for detection of events of interest, by taking the
video sequence as a query. After the video genre is identified, the query video is
evaluated by a semantic view assignment as the second stage, using the unsupervised
probabilistic latent semantic analysis (PLSA) model. Both genre identification and
video classification tasks utilize the initially processed video representation as input,
and unsupervised algorithm classifiers. Finally in the third task, the event of interest
is detected by feeding the view labels into a hidden conditional random field
(HCRF)-structured prediction model.
1.3.5.6
Video Object Segmentation
Video segmentation is done to allow the selection of some portions of video that
contain meaningful video structure based on the user's goal. If the goal is to obtain
video portions based on a single camera shot, a video parsing method is employed.
However, if the goal is to segment the video according to the object of interest, a
method for detection and tracking of video objects is needed. Chapter 7 discusses
video object segmentation for both scenarios.
Video parsing will look into an algorithm to detect shot transitions from the
compressed video, using the energy histogram of the discrete cosine transformation
(DCT) coefficients. The transition regions are amplified by using a two-sliding
window strategy for attenuation of the low-pass filtered frame distance. This
achieves high detection rates at low computational complexity on the compressed
video database.
The method for object-based video segmentation produces a video structure,
which is more descriptive than the full portion of video sequence. The video
objects are automatically detected and tracked from the input video according to the
user preference. The segmentation method incorporates shape prior to implement
Search WWH ::




Custom Search