Indexing, Object Segmentation, and Event Detection in News and Sports Videos - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

events, which are then utilized to build up a semantic description of video data that

will facilitate both tracking and searching for more instances of any given event.

For example, in the domain of news videos, an anchor shot can be detected as a

significant event by using a face detection technique. Given this anchor shot, we then

can use existing knowledge representation techniques to build up a news headline

for querying news story units. Similarly, semantic descriptions of sport videos can

be described by play events, such as pass and run in American football, which are

considered as the highlights for a game.

In order to realize the goal of building automatic tagging video data, this chapter

presents the methods for video parsing, indexing and content characterization of

news units (short, group, and story), video object segmentation, face detection in

news video, and event detection in sports video. These methods can be applied for

content characterization , which is the prerequisite for the construction of semantic

description.

Section 7.2 presents a video parsing method to segment video sequences into

video shots, which then allows subsequent operations such as feature extraction

and shot characterization. The section will look into an algorithm to detect shot

transitions (sharp and dissolved transitions) from the compressed domain, using

the energy histograms of DC coefficients. The segmentation result is enhanced by

using the ratio between two sliding windows to attenuate the low-pass filtered frame

distance and to amplify the transitional regions. This provides the advantage of

achieving high detection rates with low computational complexity. In the subsequent

content analysis, the resulting shots can be combined to the higher levels, such as

group of shorts, and video story.

Since news events happen daily, a person cannot afford to view all news on all

channels in discriminately. To alleviate the problem, we need to develop a news

video database that digitally stores full news story units, and provides interactive

retrieval interface by letting new headlines function as quires. In this way, it is

necessary to organize video content in terms of small, single-story units, instead

of shots which do not usually convey any coherent semantics to users. The users are

seeking the video contents in terms of events or stories but not in terms of changes in

visual appearance as in shots. Section 7.3 will look into the content characterization

of videos at the group and story unit levels, and demonstrate the retrieval of full

news stories by using news headlines functioning as quires.

Section 7.4 presents video segmentation methods based on the object of interest.

This method generates a segment form an input video sequence, which is more

descriptive than the full portion of the video, since the objects are automatically

detected and tracked from the input video according to the user preference.

The method incorporates shape prior to implement Graph Cut for video object

segmentation. This shape prior enhances the segmenting of objects with weak edges,

poor luminance distribution, and backgrounds with similar color and movement.

Section 7.5 presents a method to detect human faces from video sequences,

which incorporates the local histogram with optimal adaptive correlation. This alle-

viates a common problem in conventional face detection methods, i.e., inconsistent

performance due to the sensitivity to lamination variations such as local shadowing,

noise, and occlusion.

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home