Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

For instance, a user wants to watch particular goals in basketball games, or replays

in soccer matches. S/he is not only interested in the information like who/how/what,

but more importantly, the visual contents rendered from the sports clips. On the other

hand, sports videos also have very strongly correlated temporal structures. In a way,

the structure can be interpreted as a sequence of video frames which have patterns

and internal connections. This pattern is ubiquitous due to the nature of sports, a

competition where players learn from the standard in order to excel. Therefore,

an intuitive approach is to find such patterns using certain representation; and in

turn, to learn the temporal structure. Luckily, the PLSA algorithm provides such a

labeled frame sequence. What we need is a clever technique to analyze portions of

the video and determine what structured prediction model to use. In the following,

we will first review the literature. Then, we will introduce a coarse-to-fine scheme

and hidden conditional random field (HCRF) for event detection.

9.3.1

Related Work

As one of the most popular semantic tasks in video analysis, event detection has

been a popular topic from the beginning of multimedia research. Despite different

definitions of event detection by different researchers, commonly acknowledged

properties of an “event” can be summarized as follows. An event occupies a

period of time and is described using salient aspects of the video sequence input,

which consists of smaller semantic units or building blocks [ 266 ]. Lavee et al.

also summarized and classified event detection algorithms into three categories:

(a) pattern-recognition models, (b) semantic event models, and (c) state event

models. Pattern-recognition models focus on direct classification from low-level

features, but lacks semantic linkage. Semantic models target high-level semantic

rules and constraints with domain-knowledge. These models require a lot of human

involvement in creating rules and regulations using prior information. State models

utilize abstracted middle-level agents, as well as the intrinsic structure of the event

itself.

By comparing these three categories of event modeling with examples in the

literature, we think that the pattern-recognition model is heavily dependent on

classifiers, which at the moment, are not intelligent enough to understand all seman-

tics from low-level features. On the other hand, the semantic model considerably

relies on human expertise; and thus, underestimates the accuracy and efficiency

provided by classification tools. From our experience, the state model incorporates

the strength of pattern recognition at low-level with classifiers at high-level so that it

utilizes both feature extraction power and classification intelligence. Moreover, the

state model also accommodates an automatic process and unsupervised learning,

which reduces human input into the system. Therefore, state event models are

suitable for analyzing large-scale datasets, from both generic and systematic point

of views. A coarse-to-fine strategy fits well into such state event models, by first

Search WWH ::

Custom Search

Home