Database Reference
In-Depth Information
extraction methods, Kolekar and Palaniappan [ 285 ] took a top-down approach. They
first used audio features to find exciting video clip. The motion features of the whole
image volume along with the background color information are then utilized for
view-type classification. Benmokhtar et al. [ 286 ] took an approach on feature level
fusion using dynamic PCA with information coding neural-network (NN). At the
classification level, another NN is used to fuse multi-modality inputs. However,
these supervised methods are limited by the labeled data; and thus, constrained from
being expanded to larger scales.
Some other researchers pursued unsupervised methods for view classification.
Wang et al. [ 280 ] proposed an information-theoretic co-clustering method, in
which mutual information was maximized by treating shot classes and features as
two random variables. As a consequence, color histogram and perceived motion
energy features are used with a test set of four sports video genres. Zhong et al.'s
method was inspired from spectral theory conventionally used to solve segmentation
problem in graph theory [ 281 ]. They proposed a spectral-division algorithm to find
the proper video shot clustering, which were tested in three sports videos using
the HSV space color feature. Although good performances have been obtained in
these methods, the extensibility and flexibility towards diverse genres and large-
scale datasets are very limited. This limitation is again due to the domain knowledge
dependency of the extracted features.
Table 9.3 compares the aforementioned methodologies from angles of feature
utilization and classification techniques. Color and texture are two major global
features used by most works. Duan et al.'s work is the only one that proposed
middle level features developed from low-level global features. The rest of the work
either adopted additional popular global feature schemes, such as audio feature or
Gabor feature, as well as some production rule-based features, or did not utilize any.
While various global features are used, none of the local features have been applied.
Moreover, most of the supervised methods (except Duan et al.'s work) focus on a
single (soccer) sport, while unsupervised techniques use various types of sports.
Unsupervised View Classification
This section introduces the middle-level view classification, where the previously
built BoW model is also used as feature representation. Since this work targets large-
scale videos, an unsupervised solution is more viable and applicable. Therefore,
we chose to use unsupervised probabilistic latent semantic analysis (PLSA)-based
models. PLSA has demonstrated promising results in analyzing co-occurrence data
of words and documents in text retrieval [ 287 ]. From a matrix factorization point of
view, PLSA belongs to a subgroup called non-negative matrix factorization, where
the factorized matrices are non-negative [ 288 ]. Because the codebook paradigm
with codewords is adopted in mapping visual features to a probability-based
histogram which has to be non-negative, PLSA becomes a more suitable selection
compared to other factorization techniques, such as singular value decomposition or
principle component analysis.
Search WWH ::

Custom Search