Database Reference
In-Depth Information
macro-blocks are coded with a motion vector. The two features
˃ mv are
used collaboratively in the algorithm to detect the starting point of the play event.
In practice, in order to detect the starting point, a set of video shots can be selected
from each category, and used to estimate the thresholds for the mean and standard
deviation of the motion vectors.
Figure 7.22 shows the flow chart of the algorithm to estimate the frame which
represents the starting point of the play event. The following steps detail the
algorithm:
μ mv and
Step 1 : Find a P frame with a mean value of 4 or higher.
Step 2 : Determine the gradient of the mean values within a window (three or four
adjacent frames).
Step 3 : If the gradients are all positive, mark the frame as a possible starting point,
else go back to Step 1.
Step 4 : If the intensity of the motion descriptor has a value of 2 or higher, return
the frame number as the starting point.
Step 5 : Otherwise, determine the gradient of the standard deviation values within a
window (three or four adjacent frames).
Step 6 : If the gradients are all positive, return the frame number as the starting
point, else go back to Step 1.
7.6.1.2
Evaluation of Play Event Detection Algorithm
The play event detection algorithm was tested on the American football video shot
database which consists of 200 video shots taken from 4 different games and 4
different networks. In order to measure the performance of the algorithm, we have
to establish some ground truths about the starting point of the play event within each
video shot. This was accomplished by having an observer manually index the frame
number which best represented the start point of the play event.
Comparison of results was done by getting the delta between the ground truth
frame number and the frame number estimated by the algorithm. The results still
needed to be evaluated in terms of what this delta meant in actual time domain. That
is, we need to determine if the algorithm is estimating a starting point too early or if
it is estimating the starting point after a certain amount of delay.
Since MPEG-1 video has a frame rate of 30 frames/s, building a histogram with
a bin size of 30 frames would give a general idea of how far apart the estimated
frame numbers are from the ground truth in actual time domain. Figure 7.23 shows
a histogram of the number of shots within each time unit. Negative time units
represent early detection and positive time units represent a delayed detection.
From Fig. 7.23 , we can see that the algorithm detects the starting points of the
play with 83 % accuracy. That is, 166 of the 200 video shots in the database had the
starting points detected within
1 s of the original starting point. The accuracy of
the algorithm can be increased to 86.5 % by increasing the window size from three
±
Search WWH ::




Custom Search