Introduction - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

of video are used to extract high-level descriptors via a support-vector machine

decision fusion. This increases the system's capability to retrieve videos via

concept-based queries. For example, the concepts “dancing” and “gun shooting”

are utilized for retrieving relevant video clips.

1.3.5.5

Event Detection and Video Classification

Video classification explores audio, textual, and visual information to classify

videos into categories according to semantics such as events. Examples of such

events are the “Touchdowns” and “Field goals” in American football. The video

classification method based on high-level concepts is presented in Chap. 7 .The

method classifies recurring events of the games without using any domain knowl-

edge, utilizing MPEG-7 standard descriptors. The specific events are “Run play”,

“Field goal” and “Pass play”. In addition, Chap. 9 presents the method for video

genre classification. This method employs domain-knowledge independent descrip-

tors, and an unsupervised clustering technique to identify video genres. Moreover,

a systematic scheme is employed for detection of events of interest, by taking the

video sequence as a query. After the video genre is identified, the query video is

evaluated by a semantic view assignment as the second stage, using the unsupervised

probabilistic latent semantic analysis (PLSA) model. Both genre identification and

video classification tasks utilize the initially processed video representation as input,

and unsupervised algorithm classifiers. Finally in the third task, the event of interest

is detected by feeding the view labels into a hidden conditional random field

(HCRF)-structured prediction model.

1.3.5.6

Video Object Segmentation

Video segmentation is done to allow the selection of some portions of video that

contain meaningful video structure based on the user's goal. If the goal is to obtain

video portions based on a single camera shot, a video parsing method is employed.

However, if the goal is to segment the video according to the object of interest, a

method for detection and tracking of video objects is needed. Chapter 7 discusses

video object segmentation for both scenarios.

Video parsing will look into an algorithm to detect shot transitions from the

compressed video, using the energy histogram of the discrete cosine transformation

(DCT) coefficients. The transition regions are amplified by using a two-sliding

window strategy for attenuation of the low-pass filtered frame distance. This

achieves high detection rates at low computational complexity on the compressed

video database.

The method for object-based video segmentation produces a video structure,

which is more descriptive than the full portion of video sequence. The video

objects are automatically detected and tracked from the input video according to the

user preference. The segmentation method incorporates shape prior to implement

Search WWH ::

Custom Search

Home