Indexing, Object Segmentation, and Event Detection in News and Sports Videos - Multimedia Database Retrieval: Technology and Applications - page 206

Database Reference

In-Depth Information

Table 7.3

Play event classification results, obtained by multiple feature types

Play

category

MPEG-7 audio

+

MPEG-7 motion

+

MPEG-7 motion

+

MPEG-7 motion

+

MFCC

audio

MFCC

audio

+

MFCC

Pass

70.0 %

85.2 %

85.2 %

94.3 %

Run

59.7 %

91.0 %

92.5 %

89.6 %

FG/XP

75.0 %

87.5 %

87.5 %

93.8 %

K/P

69.0 %

82.8 %

82.8 %

93.1 %

Overall

67 . 0%

87 . 0%

87 . 5%

92 . 5%

Table 7.4 Play event classification results, obtained by three sets of

features, based on motion combined with other modalities

Method

Pass

Run

EG/XP

K/P

MPEG-7 motion

79.5 %

92.5 %

87.5 %

65.5 %

MPEG-7 motion + audio

85.2 %

91.0 %

87.5 %

82.8 %

MPEG7 motion + audio + MFCC

94.3 %

89.6 %

93.8 %

93.1 %

different networks. This variety in the database ensured that the sample space of the

current work was diverse and included all the major broadcasters.

Table 7.3 , shows the indexing results of using MPEG-7 motion and audio

descriptors along with MFCC features. From table, we can see the classification

accuracy increased with the combining of multi-modal features. In the case of

combining the MPEG-7 audio with MFCC features, we see an overall increase of

10 %, while combining the audio features with motion descriptor features shows

an increase of 5 %. Combining all three features produces an overall classification

result of 92.5 %.

Combining multi-modal features in a reasonable fashion can enhance the

classification. But always there are trade-offs that need to be considered. Some

features may reduce the accuracy of classification of a particular category but may

enhance the overall performance of the system. Table 7.4 shows the variations in

classification that results from adding audio features to the motion features.

7.7

Summary

The chapter covers a broad spectrum of video segmentation, indexing, retrieval,

and classification techniques applicable to news and sports videos. Based on the

energy histogram of DCT coefficients, a shot detection algorithm for MPEG video

data in the compressed domain can be developed. The detection results can be

enhanced by using the ratio between two sliding windows to attenuate the low-pass

filtered frame distances. The advantage is in achieving high detection rates with low

computational complexity. In a subsequent process, news videos can be segmented

into shot, group-of-shots, and story levels, where the template frequency model can

be applied to capture the spatio-temporal information. This facilitates video retrieval

Next Page

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home