Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

9.3.3

High-Level Event Detection

9.3.3.1

Hidden Conditional Random Field (HCRF) Model

Before learning the temporal patterns, a starting and entry point of an event needs

to be seized. A two-stage coarse-to-fine event detection strategy is suitable for this

scenario. The first stage is a rough event recognition and localization utilizing rich

and accurate text-based information either from web-casting text or optical character

recognition (OCR) techniques of the scoreboard update. In the second stage, precise

video contents associated with the semantic event have been detected in terms of

event boundary detection and accuracy analysis. The coarse-to-fine techniques have

been proven effective and accurate [ 290 ]. Web-casting text for coarse-stage event

detection and video alignment was studied and analyzed such as replaying scenes

and various goal and shot scenes detection in soccer video [ 291 , 292 ].

Since the proposed framework targets the generic learning model that can be

extended to large-scale datasets, we rely on visual content, that is, the local features

extracted and middle-level views classified from such features. To demonstrate the

effectiveness of the proposed model, we focus on a particular basketball score

event detection. We adopted the previously developed scoreboard update detection

method for a coarse-stage process in order to obtain the time-stamp [ 290 ]. The

fine-stage process focuses on robust and accurate visual content detection from

the score event. The video sequence is analyzed by distinguishing the actual score

event from false alarm events, such as timeouts or intermission, which are also

concurrent with scoreboard information. We propose a HCRF-based structured

prediction model utilizing previously classified views, thereby completing the

generic approach. For example, the HCRF model can be used to detect the score

event in basketball for exciting events and highlights. Such an HCRF technique

belongs to the state event model defined in related works. Therefore, HCRF takes

the labeled sequences as input in a natural and seamless fashion. On the other hand,

HCRF is a comprehensive model which can be degraded to hidden Markov models

(HMM) or conditional random fields (CRF) with certain constraints. The merits of

HCRF compared with the other two models are its resilience and robustness with a

combination of both the hidden states and the Markov property relaxation. Technical

details are examined in the following.

There are several advantages of using HCRF in large-scale datasets, rather than

HMM, or CRF models. First, HCRF relaxes the Markov property, which assumes

that the future state only depends on the current state. In our generic framework,

video frames are uniformly decimated and sampled, regardless of the temporal

pace of the video itself. In some cases, several consecutive frames have the same

labeling, while in other cases, different labels are assigned. Markov property-based

models such as HMM are appropriate for the former scenarios, but not suitable

for the latter ones, since the future state in HMM only cares about the current

state label, but not previous states. On the other hand, HCRF is flexible and takes

surrounding states from both before and after the current state. Thus, HCRF is more

robust for dealing with large-scale homogeneous processes and uniform sampling

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home