Digital Signal Processing Reference
In-Depth Information
stage, precise video contents associated to the semantic event have been detected
in terms of the event boundary detection and accuracy analysis. For example, Web-
casting text for coarse stage event detection and video alignment was studied and
analyzed such as replay scenes and various goal and shot scenes detection in soccer
video [ 7 , 61 ].
Since the proposed framework targets on the generic learning model that can
be extended to large-scale, we propose a HCRF based structured prediction model
utilizing previously classified views, and completing the generic approach. For ex-
ample, the HCRF model can be used to detect the score event in basketball for
exciting events and highlights. Such a HCRF technique belongs to the state event
model defined in the related works. Therefore, the HCRF takes the labeled se-
quences as input in a natural and seamless fashion. On the other hand, the HCRF is
a comprehensive model, which can be degraded to hidden Markov models (HMM)
or conditional random fields (CRF) with certain constraints. The merits of HCRF
comparing the other two models are its resilience and robustness with combination
of both the hidden states and the Markov property relaxation.
There are several advantages of using the HCRF in large-scale datasets than
HMM or CRF models. Firstly, HCRF relaxes the Markov property, which assumes
that the future state only depends on the current state. In our generic framework,
video frames are uniformly decimated and sampled, regardless of the temporal pace
of video itself. In some cases, several consecutive frames have the same labeling
while in other cases, different labels are assigned. Markov property based model
such as HMM is appropriate for the former scenarios but not suitable for the latter
ones, since the future state in HMM only cares about the current state label but not
previous states. On the other hand, HCRF is flexible and takes surrounding states
from both before and after the current state. Thus, HCRF is more robust for dealing
with large-scale homogeneous process and uniform sampling with no prior knowl-
edge. For instance, if a key frame immediate preceding the current stage is missed
due to the uniform sampling. such an information loss could be compensated by
including and summing up previous or later information without misclassifying the
event. Secondly, HCRF has merit in its hidden states structure, which helps to re-
lax the requirement of explicit observed states. This is also an advantage in dealing
large-scale uniformly sampled video frames. It is because that in computation, the
CRF model outputs individual result label (such as event or not event) per state and
requires separate CRFs to present each possible event [ 62 ]. In HCRF, only one fi-
nal result is presented in terms of multi-class events occurring probabilities. From
the robustness point of view, a CRF model can be easily ruined by semantically
unrelated frames due to the automatic uniform sampling. A multiclass HCRF on
the other hand, can correct the error introduced by such unrelated frames using
probability-based outputs [ 49 ]. Moreover, the HCRF is also appealing in allowing
the use of not explicitly labeled training data with partial structure [ 49 ]. From lit-
erature, HCRF has been successfully used in gesture recognition [ 49 ] and phone
classification [ 12 ].
Figure 4.12 a illustrates a HCRF structure, in which a label y
Y of event
type is predicted from an input X . This input consists of a sequence of vectors
Search WWH ::




Custom Search