Database Reference
In-Depth Information
9.2
Video Representation and Genre Categorization
This section covers the first part of our proposed framework, generic feature
extraction with the BoW model, and systematic genre categorization. Figure 9.2
illustrates details of each process.
9.2.1
Related Work
Video genre and its categorization was one of the earliest video analysis which
drew researchers' interests. The main task of this genre categorization starts from
a diverse group of videos, such as sports, music, news, movies etc., and gradually
moves to a more discriminating categorization such as identifying the sports genres.
Various works have been highlighted as follows. However, a major and common
disadvantage of these works is their heavy dependency on domain knowledge.
Fischer et al. [ 248 ] first proposed a classification method based on five different
video genres. Brezeale and Cook [ 249 ] provided an extensive survey in this field.
Incorporating the survey and most recent works, a concise summary is provided
in Table 9.1 . Color features with C4.5 decision trees were used in [ 250 ]. Camera
motion features with statistical classifiers were chosen to classify six sports genre
in [ 251 ]. A principal component analysis (PCA) modified audio-visual feature was
used to train a Gaussian mixture model (GMM) classifier in [ 245 ]. Semantic shots
(views) were used to help in genre categorization in [ 252 ]. Motion and color, as well
as audio features, were applied in [ 253 ]. Color features with a hierarchical support
vector machine (SVM) were used in [ 254 ]. High-level MPEG-7 features were
extracted and applied in multi-modality classifiers in [ 255 ]. The best classification
result at the moment has an accuracy of 95 % using a dataset of eight different genres
[ 256 ]. These methods used various domain knowledge with supervised classifiers to
achieve the automatic genre categorizations.
As defined in [ 257 ], domain knowledge-based features can be divided into
two categories, cinematic-based features and object-based features. The cinematic
feature involves middle to high level semantics from common video composition
or production rules such as shots/views or events, while object-based features are
described by their special properties, such as color, shape, and texture, as well as
spatial-temporal-based object motions. As Table 9.1 shows, all reviewed works
are domain knowledge-dependent, either object-based or cinematic-based. A lack
of diversity, that is, the number of different genres in the database, restricts these
methods from generality.
Search WWH ::




Custom Search