Database Reference
In-Depth Information
An interesting event tactic analysis is proposed by Zhu et al. [ 247 ], which is beyond
the conventional event and adopts the cooperative nature and tactic patterns of team
sports. Extensive experiments have been conducted on soccer.
Table 9.2 provides a comparison of the aforementioned literature from a feature
utilization point of view. Most of the methods utilize multimodality schemes of
features input. By comparing the number of events processed, it appears that the
state event model has better scalability in examining various event scenarios. It is
also interesting to point out that local visual features have not been utilized in any
of the methods. In addition, many of the methods, especially state event models,
require middle-level semantic agents to bridge the gap between the low-level
features and the high-level events. Such middle-level agents have to be labeled data.
However, for the generic method presented in this work, we tackle event detection
problem using the input obtained by unsupervised learning and unlabeled data.
9.3.2
Middle-Level Unsupervised View Classification
Once a video genre is identified, the next step is to achieve view classification of
each of the video frames in the query sequence. We present a literature review first,
followed by the proposed unsupervised method.
9.3.2.1
Related Work
We summarize related works so that readers can compare popular supervised means
with proposed unsupervised PLSA. Additionally, there are only two works using
unsupervised techniques based on our study. We present them for completeness of
the review [ 280 , 281 ].
Although there may be different nomenclatures, the fundamental purpose of the
middle-level views (shots) is to involve certain production rules to aid in high-
level tasks. This frame-based label concept was first introduced by Xu et al., who
defined three groups of views: global, zoom-in, and close-up [ 243 ]. Ekin and Tekalp
[ 244 ] used a slightly different notation which includes long-shot, middle-shot, and
close-up/out-of-field. Duan et al. [ 282 ] used a finer view/shot group classification,
supported by innovative semantic features. These pioneering methods, along with
other works such as [ 283 - 285 ] focus on using decision tree classifiers to link
the low-level features to view/shot types. Xu et al. [ 243 ] and Ekin et al. [ 244 ]
applied color-based grass detector and field/object size to determine view types.
Incorporating previously mentioned features, Tong et al. [ 283 ] added head-area
detection, as well as a grey-level co-occurrence matrix(GLCM) to improve the
decision tree on classification. Wang et al. [ 284 ] used field region extraction, object
segmentation and edge detection for view type decision making. Duan et al. [ 282 ]
first extended the research from single genre (soccer) to multiple genres (four sports)
using individual genre-based decision trees. Different from previous visual feature
Search WWH ::




Custom Search