Database Reference
In-Depth Information
For instance, a user wants to watch particular goals in basketball games, or replays
in soccer matches. S/he is not only interested in the information like who/how/what,
but more importantly, the visual contents rendered from the sports clips. On the other
hand, sports videos also have very strongly correlated temporal structures. In a way,
the structure can be interpreted as a sequence of video frames which have patterns
and internal connections. This pattern is ubiquitous due to the nature of sports, a
competition where players learn from the standard in order to excel. Therefore,
an intuitive approach is to find such patterns using certain representation; and in
turn, to learn the temporal structure. Luckily, the PLSA algorithm provides such a
labeled frame sequence. What we need is a clever technique to analyze portions of
the video and determine what structured prediction model to use. In the following,
we will first review the literature. Then, we will introduce a coarse-to-fine scheme
and hidden conditional random field (HCRF) for event detection.
9.3.1
Related Work
As one of the most popular semantic tasks in video analysis, event detection has
been a popular topic from the beginning of multimedia research. Despite different
definitions of event detection by different researchers, commonly acknowledged
properties of an “event” can be summarized as follows. An event occupies a
period of time and is described using salient aspects of the video sequence input,
which consists of smaller semantic units or building blocks [ 266 ]. Lavee et al.
also summarized and classified event detection algorithms into three categories:
(a) pattern-recognition models, (b) semantic event models, and (c) state event
models. Pattern-recognition models focus on direct classification from low-level
features, but lacks semantic linkage. Semantic models target high-level semantic
rules and constraints with domain-knowledge. These models require a lot of human
involvement in creating rules and regulations using prior information. State models
utilize abstracted middle-level agents, as well as the intrinsic structure of the event
itself.
By comparing these three categories of event modeling with examples in the
literature, we think that the pattern-recognition model is heavily dependent on
classifiers, which at the moment, are not intelligent enough to understand all seman-
tics from low-level features. On the other hand, the semantic model considerably
relies on human expertise; and thus, underestimates the accuracy and efficiency
provided by classification tools. From our experience, the state model incorporates
the strength of pattern recognition at low-level with classifiers at high-level so that it
utilizes both feature extraction power and classification intelligence. Moreover, the
state model also accommodates an automatic process and unsupervised learning,
which reduces human input into the system. Therefore, state event models are
suitable for analyzing large-scale datasets, from both generic and systematic point
of views. A coarse-to-fine strategy fits well into such state event models, by first
Search WWH ::




Custom Search