Digital Signal Processing Reference
In-Depth Information
Fig. 4.1
An example of the faceted representation of scene semantics
many semantic information only from this scene: four persons (a man, a woman
and two children) are waving in an assembly. With contextual knowledge, one
can further know this scene is about B. Obama with his wife and two daughters
when announcing his presidential campaign in Springfield, Illinois, USA. However,
the semantic descriptions that can be automatically inferred by a learning system
are very limited. For example, the scene might be categorized into “outdoor” or
“city” by image classification algorithms or annotated as “ crowd , flag , building
by automatic annotation models; the person in the scene might be recognized as
“B. Obama” by face recognition algorithms; we can also use object localization al-
gorithms to learn the spatial relationship of objects (e.g., B Obama Center-of the
picture); furthermore, high-level concept detection algorithms can be used to de-
tect activities (e.g., “ waving ”) or events (e.g., “ assembly ”). As shown in Fig. 4.1 ,
these semantics can be summarized along the four aspects- which (semantic types or
categories), what (objects or scenes), where (spatial relationships), and how (actions,
activities or events):
1. Which - Semantic Types and Categories :The which facet typically refers to
semantic types or categories of scenes. Given a taxonomy, this facet helps an-
swer the question: which type or category is the scene? Description of scene
semantics using the which facet is very general, but prove to be of great impor-
tance for either organizing unseen images/scenes into broad categories, or for
semantic-based retrieval from large-scale collections.
2. What - Objects and Scenes :The what facet describes the objects and scenes in an
image/video. It answers the question what is the subject (object/scene, etc.) in it?
Extracting the what facets from scenes covers a wide range of visual learning
Search WWH ::




Custom Search