Indexing, Object Segmentation, and Event Detection in News and Sports Videos - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

macro-blocks are coded with a motion vector. The two features

˃ mv are

used collaboratively in the algorithm to detect the starting point of the play event.

In practice, in order to detect the starting point, a set of video shots can be selected

from each category, and used to estimate the thresholds for the mean and standard

deviation of the motion vectors.

Figure 7.22 shows the flow chart of the algorithm to estimate the frame which

represents the starting point of the play event. The following steps detail the

algorithm:

μ mv and

Step 1 : Find a P frame with a mean value of 4 or higher.

Step 2 : Determine the gradient of the mean values within a window (three or four

adjacent frames).

Step 3 : If the gradients are all positive, mark the frame as a possible starting point,

else go back to Step 1.

Step 4 : If the intensity of the motion descriptor has a value of 2 or higher, return

the frame number as the starting point.

Step 5 : Otherwise, determine the gradient of the standard deviation values within a

window (three or four adjacent frames).

Step 6 : If the gradients are all positive, return the frame number as the starting

point, else go back to Step 1.

7.6.1.2

Evaluation of Play Event Detection Algorithm

The play event detection algorithm was tested on the American football video shot

database which consists of 200 video shots taken from 4 different games and 4

different networks. In order to measure the performance of the algorithm, we have

to establish some ground truths about the starting point of the play event within each

video shot. This was accomplished by having an observer manually index the frame

number which best represented the start point of the play event.

Comparison of results was done by getting the delta between the ground truth

frame number and the frame number estimated by the algorithm. The results still

needed to be evaluated in terms of what this delta meant in actual time domain. That

is, we need to determine if the algorithm is estimating a starting point too early or if

it is estimating the starting point after a certain amount of delay.

Since MPEG-1 video has a frame rate of 30 frames/s, building a histogram with

a bin size of 30 frames would give a general idea of how far apart the estimated

frame numbers are from the ground truth in actual time domain. Figure 7.23 shows

a histogram of the number of shots within each time unit. Negative time units

represent early detection and positive time units represent a delayed detection.

From Fig. 7.23 , we can see that the algorithm detects the starting points of the

play with 83 % accuracy. That is, 166 of the 200 video shots in the database had the

starting points detected within

1 s of the original starting point. The accuracy of

the algorithm can be increased to 86.5 % by increasing the window size from three

±

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home