Database Reference
In-Depth Information
7.3.3
Demonstration
The CNN news video database discussed in Table 3.9 was used for the evaluation
of video characterization and retrieval methods discussed in Sect. 3.5.3 . The experi-
ment was to demonstrate that the TFM can be adapted for retrieval beyond the shot
level. There were 844 video shorts in this database. According to the time line in
the original un-segmented video, shots are jointed into the meaningful groups and
stories. Although there is an automatic technique available for detecting the news
story [ 190 , 191 ], this has been done manually to ensure the quality of the segmented
videos used for this experiment. Three feature databases were created to describe
the videos in the three levels. The lengths of the video clips were between 0.5 and
43.5 s for the group level, and 5.7-180.3 s for the story level.
In
order
to
retrieve
the
video
groups,
six
sets
of
video
intervals
were
obtained
, each of
which was obtained from different stories. In the same set, the shot interval
I i , Shot
for
querying,
{ (
I 1 , Shot ,
I 1 , Group ) q 1 ,..., (
I 6 , Shot ,
I 6 , Group ) q 6 }
was
one
part
of
the
group
interval
I i , Group .
This
allows
a
com-
parison
of
the
performance
between
query-by-video-shot
and
query-by-
video-group.
It
is
noted
that
the
lengths
of
the
queries
are
as
follows:
{ (
1s
,
1
.
9s
) q 1 , (
2
.
1s
,
3
.
3s
) q 2 , (
2
.
4s
,
12
.
3s
) q 3 , (
15
.
3s
,
39
.
3s
) q 4 ,
(
2
.
8s
,
4
.
5s
) q 5 , (
1
.
3s,
3
.
Figure 7.4 a, b shows the precision versus recall figures for all six sets of the
test queries, resulting from the retrieval of the video groups. Figure 7.4 cshows
a comparison between two querying methods: shot-to-group (STG) and group-
to-group (GTG). Evidently, the TFM exhibits a good accuracy for video group
retrieval. We have an average precision of 90 % at 50 % recall, and more than 60 % at
100 % recall. It can be observed that querying by GTG provides higher precision at
lower recall levels, while the STG is superior at higher recall levels. This is because
video intervals at the group level usually contain more information and are longer
than at shot levels. On the other hand, a video shot usually contains less information,
but can pinpoint the relevance for ranking a video at higher recall levels.
Figure 7.5 shows a group retrieval session, where a query clip contained two
shots in a total length of 1.8 s. For convenience, each of the retrieved clips is
represented by a set of frames. It can be seen that the top five retrieved video clips
are all relevant and are actually from the same story. A precise ranking of the relative
similarity (to the query) among these retrievals may also be observed.
A possible application for retrieval of the video story is to utilize a news headline
to retrieve the full news story. This enables one user to go directly to the full
story from the headline of interest. Five news stories that are introduced with at
least two headlines (summarized in Table 7.1 ) are examined. Then five shots and
five video groups from the news headline are utilized for querying. Figure 7.6
shows the system performance in retrieving the news stories by employing the shots
and groups from news headlines as queries. It is observed that all relevant video
segments related to the same story were retrieved with close to 50 % precision
(at 100 % recall). This means that, on average, all relevant video intervals can be
.
5s
) q 6 }
Search WWH ::




Custom Search