Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Therefore, a two-level bottom-up structure is proposed in this work for efficient

codebook generation. At the bottom layer, individual genre codebooks are generated

in 1st-level K-means clustering. At the upper layer, the 1st-level codebooks are used

as the input for the 2nd-level K-means to build the generic codebook. By using

this bottom-up structure, we reduce the heavy computation in measuring individual

point-to-cluster-center distance in the K-means algorithm. Moreover, since the 1st-

level K-means are independent from each other, distributed computing methods can

be applied to further reduce the computation time. The numerical analysis is referred

to in Sect. 9.4.1 .

Another advantage of bottom-up K-means clustering resides in the system update

and scalability. In the case of new genre videos added to the dataset, a codebook

update module is applied to find the new genre's individual codebook. The result,

together with existing codebooks, is used to generate the new generic codebook by

only re-running the 2nd-level K-means. In the case that new videos are imported

for an existing genre, the corresponding 1st level K-means is applied to achieve the

updated individual codebook; and then, 2nd-level K-means is re-run to update the

generic codebook.

9.2.3

Low-Level Genre Categorization

In our proposed method, at the genre categorization stage, a query video is expressed

as a histogram Q that also uses the generic codebook and the BoW model. Then,

a k-Nearest Neighbor (k-NN) classifier is applied with a defined dissimilarity

measurement between the query Q and a trained individual genre P . Consequently,

the query video is identified as the genre whose distribution is closest to that of the

query within measure. Technical details are presented in Sect. 9.4.1 .

By identifying the genre of this query video, subsequent processes are confined to

a focused group, and the scale of computation is decreased. Therefore, advanced and

sophisticated techniques can be used in middle/high-level video analysis. In the next

step, training data is characterized by frequency-based histogram representation.

The individual genre is modularized as a distribution denoted by P using training

data of its own kind.

9.3

High-Level Event Detection Using Middle-Level

View as Agent

Content-based video event detection is among the most popular quest for high-

level semantic analysis. Different from video abstraction and summarization, which

targets any interesting events happening in a video rush, event detection is only

constrained to a predefined request type (such as the third goal or the second

penalty kick in a particular soccer match). In sports videos, a consumer's interest in

events resides in the actual video contents, more than just the information delivered.

Search WWH ::

Custom Search

Home