Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

An interesting event tactic analysis is proposed by Zhu et al. [ 247 ], which is beyond

the conventional event and adopts the cooperative nature and tactic patterns of team

sports. Extensive experiments have been conducted on soccer.

Table 9.2 provides a comparison of the aforementioned literature from a feature

utilization point of view. Most of the methods utilize multimodality schemes of

features input. By comparing the number of events processed, it appears that the

state event model has better scalability in examining various event scenarios. It is

also interesting to point out that local visual features have not been utilized in any

of the methods. In addition, many of the methods, especially state event models,

require middle-level semantic agents to bridge the gap between the low-level

features and the high-level events. Such middle-level agents have to be labeled data.

However, for the generic method presented in this work, we tackle event detection

problem using the input obtained by unsupervised learning and unlabeled data.

9.3.2

Middle-Level Unsupervised View Classification

Once a video genre is identified, the next step is to achieve view classification of

each of the video frames in the query sequence. We present a literature review first,

followed by the proposed unsupervised method.

9.3.2.1

Related Work

We summarize related works so that readers can compare popular supervised means

with proposed unsupervised PLSA. Additionally, there are only two works using

unsupervised techniques based on our study. We present them for completeness of

the review [ 280 , 281 ].

Although there may be different nomenclatures, the fundamental purpose of the

middle-level views (shots) is to involve certain production rules to aid in high-

level tasks. This frame-based label concept was first introduced by Xu et al., who

defined three groups of views: global, zoom-in, and close-up [ 243 ]. Ekin and Tekalp

[ 244 ] used a slightly different notation which includes long-shot, middle-shot, and

close-up/out-of-field. Duan et al. [ 282 ] used a finer view/shot group classification,

supported by innovative semantic features. These pioneering methods, along with

other works such as [ 283 - 285 ] focus on using decision tree classifiers to link

the low-level features to view/shot types. Xu et al. [ 243 ] and Ekin et al. [ 244 ]

applied color-based grass detector and field/object size to determine view types.

Incorporating previously mentioned features, Tong et al. [ 283 ] added head-area

detection, as well as a grey-level co-occurrence matrix(GLCM) to improve the

decision tree on classification. Wang et al. [ 284 ] used field region extraction, object

segmentation and edge detection for view type decision making. Duan et al. [ 282 ]

first extended the research from single genre (soccer) to multiple genres (four sports)

using individual genre-based decision trees. Different from previous visual feature

Search WWH ::

Custom Search

Home