Database Reference
In-Depth Information
events, which are then utilized to build up a semantic description of video data that
will facilitate both tracking and searching for more instances of any given event.
For example, in the domain of news videos, an anchor shot can be detected as a
significant event by using a face detection technique. Given this anchor shot, we then
can use existing knowledge representation techniques to build up a news headline
for querying news story units. Similarly, semantic descriptions of sport videos can
be described by play events, such as pass and run in American football, which are
considered as the highlights for a game.
In order to realize the goal of building automatic tagging video data, this chapter
presents the methods for video parsing, indexing and content characterization of
news units (short, group, and story), video object segmentation, face detection in
news video, and event detection in sports video. These methods can be applied for
content characterization , which is the prerequisite for the construction of semantic
description.
Section 7.2 presents a video parsing method to segment video sequences into
video shots, which then allows subsequent operations such as feature extraction
and shot characterization. The section will look into an algorithm to detect shot
transitions (sharp and dissolved transitions) from the compressed domain, using
the energy histograms of DC coefficients. The segmentation result is enhanced by
using the ratio between two sliding windows to attenuate the low-pass filtered frame
distance and to amplify the transitional regions. This provides the advantage of
achieving high detection rates with low computational complexity. In the subsequent
content analysis, the resulting shots can be combined to the higher levels, such as
group of shorts, and video story.
Since news events happen daily, a person cannot afford to view all news on all
channels in discriminately. To alleviate the problem, we need to develop a news
video database that digitally stores full news story units, and provides interactive
retrieval interface by letting new headlines function as quires. In this way, it is
necessary to organize video content in terms of small, single-story units, instead
of shots which do not usually convey any coherent semantics to users. The users are
seeking the video contents in terms of events or stories but not in terms of changes in
visual appearance as in shots. Section 7.3 will look into the content characterization
of videos at the group and story unit levels, and demonstrate the retrieval of full
news stories by using news headlines functioning as quires.
Section 7.4 presents video segmentation methods based on the object of interest.
This method generates a segment form an input video sequence, which is more
descriptive than the full portion of the video, since the objects are automatically
detected and tracked from the input video according to the user preference.
The method incorporates shape prior to implement Graph Cut for video object
segmentation. This shape prior enhances the segmenting of objects with weak edges,
poor luminance distribution, and backgrounds with similar color and movement.
Section 7.5 presents a method to detect human faces from video sequences,
which incorporates the local histogram with optimal adaptive correlation. This alle-
viates a common problem in conventional face detection methods, i.e., inconsistent
performance due to the sensitivity to lamination variations such as local shadowing,
noise, and occlusion.
Search WWH ::




Custom Search