Digital Signal Processing Reference
In-Depth Information
Fig. 4.7 Some representative results. Note that here the eye density maps are not convolved with
a Gaussian kernel, which is a popular method to recover more positive samples for the evaluation.
( a ) Original frames; ( b ) eye fixation maps; ( c )[ 24 ]; ( d )[ 19 ]; ( e )[ 20 ]; ( f )[ 16 ]; ( g )[ 13 ]; ( h )[ 15 ];
( i )[ 68 ]; ( j )[ 46 ]; ( k )[ 29 ]; ( l )[ 44 ]; ( m ) our approach
Fig. 4.8 Targets and distractors in different scenes can be best distinguished by different features.
( a ),( b ) the “motion” feature; ( c ),( d ) the “color” feature
4.3.3.4
Multi-Task Rank Learning for Visual Saliency Estimation in Video
Generally speaking, a unified ranking function derived with the proposed approach
can obtain impressive results in some cases but meanwhile may suffer poor per-
formance in other cases since they often construct a unified model for all scenes.
Actually, the features that can best distinguish targets from distractors may vary
remarkably in different scenes. In surveillance video, for instance, the motion fea-
tures can be used to efficiently pop-out a car or a walking person (as shown in
Fig. 4.8 a, b); while to distinguish a red apple/flower from its surroundings, color
contrasts should be used (as shown in Fig. 4.8 c, d). In most cases, it is infeasible to
pop-out the targets and suppress the distractors by using a fixed set of visual fea-
tures. Therefore, it is necessary to construct scene-specific models that adaptively
adopt different solutions for different scene categories.
Toward this end, we propose a multi-task rank learning approach for visual
saliency estimation. In this approach, visual saliency estimation is also formulated
as a pair-wise rank learning problem. However, this approach constructs multiple
visual saliency models, each for a scene cluster, by learning and integrating the fea-
tures that best distinguish targets from distractors in that cluster. We also propose
Search WWH ::




Custom Search