Audio-Visual Fusion for Film Database Retrieval and Classification - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

key frames, and this method is denoted as a MKF (multiple-key frame) method.

For comparison, a single frame which is the closest frame to a cluster centriod

was selected as a key frame, and this method is denoted as a SKF (single key

frame) method. Video content similarity matching used by the SKF was obtained

by comparing the descriptor vectors of the selected key frames of the query and the

target videos. However, for the MKF method, similarity measures are obtained by

matching multiple key frames of the query against multiple key frames in the target

video clips. To be precise, let S be a similarity score. The similarity was obtained by:

N

i = 1 s i

=

S

(10.15)

s i =

min

M {

d

[

i

,

j

] }

(10.16)

j

=

1

,...,

where d

is the distance between the i -th key-frame of the query and the j -th

key-frame of the target video; N and M are the total number of key-frames of the

query and target videos, respectively.

From the results, it is observed that although the SKF method can be used for

retrieval of video shots, SKF is less effective in characterizing video content of

video clips. The SKF result achieved 39.22 % precision. By considering multiple

key frames as in the MKF method, the performance of the key-frame based video

indexing method can be improved to 62.34 %. However, this result is approximately

10 % less precise than that of TFM.

In order to achieve high retrieval performance, the iARM system was imple-

mented using the automatic and semi-automatic retrieval algorithms. The pseudo-

relevance feedback using the adaptive cosine network architecture (discussed in

Chap. 3 ) was employed. In this case, depending on the internet traffic conditions,

users can submit automatic and semi-automatic queries, and the automatic query

can avoid the transmission of training sample video files over the internet. Using

the same set of queries as in the previous results, this system first performed an

automatic retrieval for each query to adaptively improve its performance. After three

iterations of signal propagation in the adaptive cosine network, the system was then

assisted by users. Table 10.4 provides the summary of the retrieval results, obtained

by automatic and semiautomatic methods. It is observed that the semiautomatic

method was superior to the automatic method and the user interaction method.

The best performance was achieved at 92.03 % precision. In addition, the moderate

performance of the automatic method can be beneficial to the user when internet

resources are limited.

The strength of the iARM system was evaluated against a variety of templates

used by TFM for indexing video clips. Specifically, three sets of templates at

T c =

[

i

,

j

]

500, were generated using a competitive learning algorithm,

where T c denotes the number of templates. These are approximately 3 %, 6 %,

and 9 % of the training sample set, respectively. For each set of templates, video

500, 1

,

000 and 1

,

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home