Self-adaptation in Image and Video Retrieval - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

the video nodes, inducing an order for the videos, based on the corresponding node

activations at each stage.

A new activation level computed in Eq. ( 3.41 ) can be viewed as a modified

weight of the query template, where only videos with significant activation levels

are considered to be good candidates for modifying the query template activations.

This considers only positive feedback. However, anti-reinforcement learning can

be adopted to improve the speed of convergence [ 103 , 323 ], whereby both original

query components and a strategy of negative feedback can help to improve

effectiveness. Thus, as an alternative to Eq. ( 3.41 ), the following formula is derived

for the activation of the r -th video template node:

l r

a ( t )

(3.42)

1 l r

∑

a ( v )

∑

l r =

w qr + ʱ

w jr + ʲ

w jr

(3.43)

∈

Pos

∈

Neg

where a ( v )

is the activation level of the j -th video, Pos is the set of j 's such that

a ( v )

, and Neg is the set of j 's such that a ( v )

> ʾ

< − ʾ

, where

is a threshold value.

In addition,

are the suitable positive and negative constant values.

Table 3.8 provides a summary of the pseudo-RF learning algorithm implemented

by the adaptive cosine network. The input query weights w qr ,

and

R are

utilized to activate video template nodes. These are then modified by the activation

levels of the video nodes in the positive and negative feedback sets. The final

network output is the video ranking result for video retrieval.

,...,

3.5.3

Experimental Result

This section describes an application of TFM video indexing and adaptive cosine

network for video retrieval. The performance of the TFM method is compared with

the key-fame-based video indexing (KFVI) algorithm [ 104 ], which has become a

popular benchmark for shot-based video retrieval. Table 3.9 provides a summary of

the video data, obtained from the Informedia Digital Video Library Project [ 105 ].

This is a collection of CNN broadcast news, which includes full news stories, news

headlines, and commercial clips. This video has 844 video shots (see Fig. 3.15 ),

segmented by the color histogram based shot boundary detection algorithm [ 106 ].

A 48-bin histogram computed on HSV color space is used for both shot seg-

mentation and for the indexing algorithms. The KFVI uses a histogram vector

generated from a middle frame of the video shot as a representative video shot.

The resulting feature database was scaled according to Gaussian normalization. In

the TFM method, a total of R

000 templates were generated. Each video shot

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home