Digital Signal Processing Reference
In-Depth Information
Fig. 4.7
Some representative results. Note that here the eye density maps are not convolved with
a Gaussian kernel, which is a popular method to recover more positive samples for the evaluation.
(
a
) Original frames; (
b
) eye fixation maps; (
c
)[
24
]; (
d
)[
19
]; (
e
)[
20
]; (
f
)[
16
]; (
g
)[
13
]; (
h
)[
15
];
(
i
)[
68
]; (
j
)[
46
]; (
k
)[
29
]; (
l
)[
44
]; (
m
) our approach
Fig. 4.8
Targets and distractors in different scenes can be best distinguished by different features.
(
a
),(
b
) the “motion” feature; (
c
),(
d
) the “color” feature
4.3.3.4
Multi-Task Rank Learning for Visual Saliency Estimation in Video
Generally speaking, a unified ranking function derived with the proposed approach
can obtain impressive results in some cases but meanwhile may suffer poor per-
formance in other cases since they often construct a unified model for all scenes.
Actually, the features that can best distinguish targets from distractors may vary
remarkably in different scenes. In surveillance video, for instance, the motion fea-
tures can be used to efficiently pop-out a car or a walking person (as shown in
Fig.
4.8
a, b); while to distinguish a red apple/flower from its surroundings, color
contrasts should be used (as shown in Fig.
4.8
c, d). In most cases, it is infeasible to
pop-out the targets and suppress the distractors by using a fixed set of visual fea-
tures. Therefore, it is necessary to construct scene-specific models that adaptively
adopt different solutions for different scene categories.
Toward this end, we propose a multi-task rank learning approach for visual
saliency estimation. In this approach, visual saliency estimation is also formulated
as a pair-wise rank learning problem. However, this approach constructs multiple
visual saliency models, each for a scene cluster, by learning and integrating the fea-
tures that best distinguish targets from distractors in that cluster. We also propose