Self-adaptation in Image and Video Retrieval - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Table 3.6 Average precision rate (%) and number of user feedback cycles, obtained by

retrieving 20 queries from the Corel database, measured from the top 16 best matches

Method

Average precision (%)

Number of user RF (Iter.)

Non-adaptive method

52.81

-

Pseudo-RF

78.13

-

User-controlled RF

82.19

1

Semi-automatic RF

87.50

1

point, the user-controlled RF method used on average 2.4 cycles of user interactions,

while the semi-automatic RF method used 1.6 cycles. This shows that the semi-

automatic method is the most effective learning strategy in terms of both retrieval

accuracy and minimization of user interaction. This also demonstrates that the

application of pseudo-RF in combination with perceptually significant features

extracted from the ROIs clearly enhanced the overall system performance.

3.5

Video Re-ranking

Incorporating the pseudo-RF method for improving retrieval accuracy is important.

While RF for video retrieval has been implemented [ 98 , 99 ], where the audio-visual

information is utilized for characterizing spatio-temporal information within the

video sequence, the application of RF to video files is, however, a time consuming

process, since users have to play each retrieved video file, which is usually large, in

order to provide relevance feedback. In practice, this is a more difficult interaction

with sample video files for retrieval on Internet databases. In this section, the

RF is considered an important method and is implemented in automatic fashion.

The retrieval system utilizes the template frequency model (TFM) to characterize

both spatial and temporal information. This representation allows RF to effectively

analyze the dynamic content of the video. The TFM is conducted with the same

principle as the bag-of-word model. It is suitably integrated with a cosine network

[ 100 ] for implementing pseudo RF, to further allow improvement of retrieval

accuracy, while minimizing user interactions.

3.5.1

Template Frequency Model Implementing

Bag-of-Words Model

The template frequency model [ 101 ] views a video datum as a set of visual

templates, in the same spirit as bag-of-words modeling. Let

V

be a video interval

P denote

a feature vector (i.e., color histogram feature) extracted from the m-th frame.

that contains a finite set of frames

f 1 ,

f 2 ,...,

f M . Also, let x m ∈ R

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home