Video Scene Analysis: A Machine Learning Perspective - Video Segmentation and Its Applications

Digital Signal Processing Reference

In-Depth Information

Fig. 4.1

An example of the faceted representation of scene semantics

many semantic information only from this scene: four persons (a man, a woman

and two children) are waving in an assembly. With contextual knowledge, one

can further know this scene is about B. Obama with his wife and two daughters

when announcing his presidential campaign in Springfield, Illinois, USA. However,

the semantic descriptions that can be automatically inferred by a learning system

are very limited. For example, the scene might be categorized into “outdoor” or

“city” by image classification algorithms or annotated as “ crowd , flag , building ”

by automatic annotation models; the person in the scene might be recognized as

“B. Obama” by face recognition algorithms; we can also use object localization al-

gorithms to learn the spatial relationship of objects (e.g., B Obama Center-of the

picture); furthermore, high-level concept detection algorithms can be used to de-

tect activities (e.g., “ waving ”) or events (e.g., “ assembly ”). As shown in Fig. 4.1 ,

these semantics can be summarized along the four aspects- which (semantic types or

categories), what (objects or scenes), where (spatial relationships), and how (actions,

activities or events):

1. Which - Semantic Types and Categories :The which facet typically refers to

semantic types or categories of scenes. Given a taxonomy, this facet helps an-

swer the question: which type or category is the scene? Description of scene

semantics using the which facet is very general, but prove to be of great impor-

tance for either organizing unseen images/scenes into broad categories, or for

semantic-based retrieval from large-scale collections.

2. What - Objects and Scenes :The what facet describes the objects and scenes in an

image/video. It answers the question what is the subject (object/scene, etc.) in it?

Extracting the what facets from scenes covers a wide range of visual learning

Search WWH ::

Custom Search

Home