Database Reference
In-Depth Information
the video nodes, inducing an order for the videos, based on the corresponding node
activations at each stage.
A new activation level computed in Eq. ( 3.41 ) can be viewed as a modified
weight of the query template, where only videos with significant activation levels
are considered to be good candidates for modifying the query template activations.
This considers only positive feedback. However, anti-reinforcement learning can
be adopted to improve the speed of convergence [ 103 , 323 ], whereby both original
query components and a strategy of negative feedback can help to improve
effectiveness. Thus, as an alternative to Eq. ( 3.41 ), the following formula is derived
for the activation of the r -th video template node:
l r
a ( t )
=
(3.42)
r
1 l r
1
2
r
=
a ( v )
j
a ( v )
j
l r =
w qr + ʱ
w jr + ʲ
w jr
(3.43)
j
Pos
j
Neg
where a ( v )
j
is the activation level of the j -th video, Pos is the set of j 's such that
a ( v )
j
, and Neg is the set of j 's such that a ( v )
j
> ʾ
< − ʾ
, where
ʾ
is a threshold value.
In addition,
are the suitable positive and negative constant values.
Table 3.8 provides a summary of the pseudo-RF learning algorithm implemented
by the adaptive cosine network. The input query weights w qr ,
ʱ
and
ʲ
R are
utilized to activate video template nodes. These are then modified by the activation
levels of the video nodes in the positive and negative feedback sets. The final
network output is the video ranking result for video retrieval.
r
=
1
,...,
3.5.3
Experimental Result
This section describes an application of TFM video indexing and adaptive cosine
network for video retrieval. The performance of the TFM method is compared with
the key-fame-based video indexing (KFVI) algorithm [ 104 ], which has become a
popular benchmark for shot-based video retrieval. Table 3.9 provides a summary of
the video data, obtained from the Informedia Digital Video Library Project [ 105 ].
This is a collection of CNN broadcast news, which includes full news stories, news
headlines, and commercial clips. This video has 844 video shots (see Fig. 3.15 ),
segmented by the color histogram based shot boundary detection algorithm [ 106 ].
A 48-bin histogram computed on HSV color space is used for both shot seg-
mentation and for the indexing algorithms. The KFVI uses a histogram vector
generated from a middle frame of the video shot as a representative video shot.
The resulting feature database was scaled according to Gaussian normalization. In
the TFM method, a total of R
=
5
,
000 templates were generated. Each video shot
 
Search WWH ::




Custom Search