Digital Signal Processing Reference
In-Depth Information
quest, which could be from the ball game or the shooting sport. By indiscriminately
treating the entire dataset, this event will be searched through all types of sports.
However, since sports like figure-skating and swimming have no “shooting” at all,
the effort in searching this event at those non-relevant sports becomes infeasible.
Instead of treating all data indifferently, a more efficient approach is to identify the
genre of the query video first, and then deploy middle/high-level tasks consequently.
In the middle-level and the second module, semantic view types are classified us-
ing an unsupervised PLSA learning method to provide labels for input video frames.
View describes an individual video frame by abstracting its overall content. It is
treated as a bridge between low-level visual features and high-level semantic un-
derstanding. In addition, unsupervised learning saves a massive amount of human
effort in processing large-scale data. Moreover, the supervised methods can also be
implemented upon our proposed platform. Therefore, a SVM model is executed as
the baseline for the comparison purpose.
Finally at the third module, a structured prediction HCRF model using labeled
inputs is a natural fit to the system in detecting semantic events. This can be justified
by the fact that a video event occupies various lengths along the temporal dimension.
Thus, the state event model-based HCRF is suitable to deploy. Less comprehensive
baseline methods such as the hidden Markov model and the conditional random
field can also be applied in this platform.
4.4.1.2
High-Level Event Detection for Sports Video
Content-based video event detection is among the most popular quest for the high
level semantic analysis. Different from video abstraction and summarization which
target on any interesting events happening in a video rush, event detection is only
constrained to a pre-defined request type, such as the third goal or the second penalty
kick in a particular soccer match. In sports video, a consumers interest of events re-
sides in the actual video contents, more than just the information delivered. On the
other hand, sports videos also have a very strongly correlated temporal structure.
In a way, such the structure can be interpreted as a sequence of video frames which
have patterns and internal connections. This pattern existence is ubiquitous due to
the nature of the sports, a competition where players learn from the standard in or-
der to excel. Therefore, an intuitive approach is to find such patterns using certain
representation and learn the temporal structure. Luckily, the PLSA approach pro-
vides such a labeled frame sequence and what we need is a clever technique on
which portion of the video to analyze and what robust structured prediction model
to use. Following, we will introduce a coarse-to-fine scheme and hidden conditional
random field (HCRF) for event detection.
Before learning the tempo and patterns, a starting and entry point of an event
needs to be seized. A two-stage coarse-to-fine event detection strategy is suitable
for this scenario. The first stage is a rough event recognition and localization utiliz-
ing rich and accurate text-based information either from web-casting text or optical
character recognition (OCR) techniques of the score-board update. In the second
Search WWH ::




Custom Search