Database Reference
In-Depth Information
Table 3.6 Average precision rate (%) and number of user feedback cycles, obtained by
retrieving 20 queries from the Corel database, measured from the top 16 best matches
Method
Average precision (%)
Number of user RF (Iter.)
Non-adaptive method
52.81
-
Pseudo-RF
78.13
-
User-controlled RF
82.19
1
Semi-automatic RF
87.50
1
point, the user-controlled RF method used on average 2.4 cycles of user interactions,
while the semi-automatic RF method used 1.6 cycles. This shows that the semi-
automatic method is the most effective learning strategy in terms of both retrieval
accuracy and minimization of user interaction. This also demonstrates that the
application of pseudo-RF in combination with perceptually significant features
extracted from the ROIs clearly enhanced the overall system performance.
3.5
Video Re-ranking
Incorporating the pseudo-RF method for improving retrieval accuracy is important.
While RF for video retrieval has been implemented [ 98 , 99 ], where the audio-visual
information is utilized for characterizing spatio-temporal information within the
video sequence, the application of RF to video files is, however, a time consuming
process, since users have to play each retrieved video file, which is usually large, in
order to provide relevance feedback. In practice, this is a more difficult interaction
with sample video files for retrieval on Internet databases. In this section, the
RF is considered an important method and is implemented in automatic fashion.
The retrieval system utilizes the template frequency model (TFM) to characterize
both spatial and temporal information. This representation allows RF to effectively
analyze the dynamic content of the video. The TFM is conducted with the same
principle as the bag-of-word model. It is suitably integrated with a cosine network
[ 100 ] for implementing pseudo RF, to further allow improvement of retrieval
accuracy, while minimizing user interactions.
3.5.1
Template Frequency Model Implementing
Bag-of-Words Model
The template frequency model [ 101 ] views a video datum as a set of visual
templates, in the same spirit as bag-of-words modeling. Let
V
be a video interval
P denote
a feature vector (i.e., color histogram feature) extracted from the m-th frame.
that contains a finite set of frames
f 1 ,
f 2 ,...,
f M . Also, let x m R
 
Search WWH ::




Custom Search