Database Reference
In-Depth Information
9.3.3
High-Level Event Detection
9.3.3.1
Hidden Conditional Random Field (HCRF) Model
Before learning the temporal patterns, a starting and entry point of an event needs
to be seized. A two-stage coarse-to-fine event detection strategy is suitable for this
scenario. The first stage is a rough event recognition and localization utilizing rich
and accurate text-based information either from web-casting text or optical character
recognition (OCR) techniques of the scoreboard update. In the second stage, precise
video contents associated with the semantic event have been detected in terms of
event boundary detection and accuracy analysis. The coarse-to-fine techniques have
been proven effective and accurate [ 290 ]. Web-casting text for coarse-stage event
detection and video alignment was studied and analyzed such as replaying scenes
and various goal and shot scenes detection in soccer video [ 291 , 292 ].
Since the proposed framework targets the generic learning model that can be
extended to large-scale datasets, we rely on visual content, that is, the local features
extracted and middle-level views classified from such features. To demonstrate the
effectiveness of the proposed model, we focus on a particular basketball score
event detection. We adopted the previously developed scoreboard update detection
method for a coarse-stage process in order to obtain the time-stamp [ 290 ]. The
fine-stage process focuses on robust and accurate visual content detection from
the score event. The video sequence is analyzed by distinguishing the actual score
event from false alarm events, such as timeouts or intermission, which are also
concurrent with scoreboard information. We propose a HCRF-based structured
prediction model utilizing previously classified views, thereby completing the
generic approach. For example, the HCRF model can be used to detect the score
event in basketball for exciting events and highlights. Such an HCRF technique
belongs to the state event model defined in related works. Therefore, HCRF takes
the labeled sequences as input in a natural and seamless fashion. On the other hand,
HCRF is a comprehensive model which can be degraded to hidden Markov models
(HMM) or conditional random fields (CRF) with certain constraints. The merits of
HCRF compared with the other two models are its resilience and robustness with a
combination of both the hidden states and the Markov property relaxation. Technical
details are examined in the following.
There are several advantages of using HCRF in large-scale datasets, rather than
HMM, or CRF models. First, HCRF relaxes the Markov property, which assumes
that the future state only depends on the current state. In our generic framework,
video frames are uniformly decimated and sampled, regardless of the temporal
pace of the video itself. In some cases, several consecutive frames have the same
labeling, while in other cases, different labels are assigned. Markov property-based
models such as HMM are appropriate for the former scenarios, but not suitable
for the latter ones, since the future state in HMM only cares about the current
state label, but not previous states. On the other hand, HCRF is flexible and takes
surrounding states from both before and after the current state. Thus, HCRF is more
robust for dealing with large-scale homogeneous processes and uniform sampling
Search WWH ::




Custom Search