Database Reference
In-Depth Information
All the above descriptors were quantized into 10 levels, thus providing a feature
set of 30 dimensions.
7.6.2.3
MFCC Feature Mapping
Due to the fact that most of the video shots contained a lot of crowd noise, and our
wish to extract the perceived rhythm and sound of the spoken content, we needed
a feature that could model the human hearing and also works well under noisy
conditions. MFCC has been used extensively in speech recognition systems, as it
tries to emphasize the frequencies that are more perceptible to the human ear.
First the audio file is pre-processed in order to remove the silent segments. Then
13 MFCC coefficients are extracted for each segment. Each of the segments have
50 % overlap, and thus there is lot of redundancy between adjacent MFCC values.
In order to reduce the dimension of the matrix, the MFCC values are passed to a
feature reduction stage. The MFCC features are reduced to a 12
×
64 matrix.
7.6.3
Experimental Results
Fisher's Linear Discriminant Analysis (LDA) is employed as a classification scheme
to evaluate the efficacy of the feature set. In a specific sense, LDA also commonly
refers to techniques in which a transformation is done in order to maximize between-
class separability and minimize within-class variability. LDA works on the feature
set with no prior assumptions about the nature of the data set. It tries to compute a
weight vector w , which when multiplied by the input feature vector x would generate
discriminant functions g i (
x
)
.For C class problems, we define C discriminant
functions g 1 (
. The feature vector x is assigned to a class whose
discriminant function is the largest value of x .
All the results were based on Fisher's LDA classification technique. In order to
minimize the bias of the sample set, leave-one-out classification was implemented.
With this method, one sample from the database sample set is removed and used
as the test set. The classifier is trained with the rest of the samples. This process is
repeated with each sample in the database. This process ensures that classification
scheme does not contain bias due to the sample set size [ 219 ].
Feature selection was also performed using Wilk's Lambda criterion in order to
optimize the feature space. The dimension of the feature space was large and some
of the features did not enhance discrimination between classes. Therefore, in the
feature selection phase, the features that provided redundancy and deteriorated the
performance of the overall classification accuracy were taken out of the equation.
The test database consists of 200 video shots with durations varying from 5 s to
about 25 s. In the database, there are 88 pass plays, 67 run plays and 45 kicking
plays. A total of eight different teams were used to create the database from four
x
) ,
g 2 (
x
) ,...,
g C (
x
)
Search WWH ::




Custom Search