Indexing, Object Segmentation, and Event Detection in News and Sports Videos - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

All the above descriptors were quantized into 10 levels, thus providing a feature

set of 30 dimensions.

7.6.2.3

MFCC Feature Mapping

Due to the fact that most of the video shots contained a lot of crowd noise, and our

wish to extract the perceived rhythm and sound of the spoken content, we needed

a feature that could model the human hearing and also works well under noisy

conditions. MFCC has been used extensively in speech recognition systems, as it

tries to emphasize the frequencies that are more perceptible to the human ear.

First the audio file is pre-processed in order to remove the silent segments. Then

13 MFCC coefficients are extracted for each segment. Each of the segments have

50 % overlap, and thus there is lot of redundancy between adjacent MFCC values.

In order to reduce the dimension of the matrix, the MFCC values are passed to a

feature reduction stage. The MFCC features are reduced to a 12

×

64 matrix.

7.6.3

Experimental Results

Fisher's Linear Discriminant Analysis (LDA) is employed as a classification scheme

to evaluate the efficacy of the feature set. In a specific sense, LDA also commonly

refers to techniques in which a transformation is done in order to maximize between-

class separability and minimize within-class variability. LDA works on the feature

set with no prior assumptions about the nature of the data set. It tries to compute a

weight vector w , which when multiplied by the input feature vector x would generate

discriminant functions g i (

x

)

.For C class problems, we define C discriminant

functions g 1 (

. The feature vector x is assigned to a class whose

discriminant function is the largest value of x .

All the results were based on Fisher's LDA classification technique. In order to

minimize the bias of the sample set, leave-one-out classification was implemented.

With this method, one sample from the database sample set is removed and used

as the test set. The classifier is trained with the rest of the samples. This process is

repeated with each sample in the database. This process ensures that classification

scheme does not contain bias due to the sample set size [ 219 ].

Feature selection was also performed using Wilk's Lambda criterion in order to

optimize the feature space. The dimension of the feature space was large and some

of the features did not enhance discrimination between classes. Therefore, in the

feature selection phase, the features that provided redundancy and deteriorated the

performance of the overall classification accuracy were taken out of the equation.

The test database consists of 200 video shots with durations varying from 5 s to

about 25 s. In the database, there are 88 pass plays, 67 run plays and 45 kicking

plays. A total of eight different teams were used to create the database from four

x

) ,

g 2 (

x

) ,...,

g C (

x

)

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home