Indexing, Object Segmentation, and Event Detection in News and Sports Videos - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Let x i

∈ R

represent a visual descriptor of frame f i . A video interval I

[

f s

f e

]

at any level is characterized by a set of video descriptors represented by D I

{ (

. denotes a set of primary descriptors of I . It will

be used for obtaining a secondary descriptor used for the video indexing.

Intuitively, a video descriptor database VD for a video F is defined as a set of

video descriptors for F and has the following form:

x s

f s

) , (

x s + 1

f s + 1

) ,..., (

x e

f e

) }

= { (

D I 1 ,

I 1 ) , (

D I 2 ,

I 2 ) ,..., (

D I J ,

I J ) }

(7.12)

Based on Eq. ( 7.12 ), video descriptor databases at the shot, group, and story levels

are defined as follows:

VD Shot = { (

D I i ,

I i ) |

I i ∈

I Shot (

) }

(7.13)

VD Group = { (

D I i ,

I i ) |

I i ∈

I Group (

) }

(7.14)

VD St ory = { (

D I i ,

I i ) |

I i ∈

I St ory (

) }

(7.15)

In the above definitions, D I is regarded as the set of primary descriptors, and it is

only used to characterize video at the frame level. In order to obtain video indexing,

it will be reorganized into a higher level as a set of secondary descriptors.

7.3.2

Indexing and Retrieval of News Video

For a video descriptor database VD

= { (

D I 1 ,

I 1 ) ,..., (

D I j ,

I j ) ,..., (

D I J ,

I J ) }

, where

D I = { (

, the indexing process produces a secondary

video descriptor for each interval I j , specified as D I j ≡

x s ,

f s ) , (

x s + 1 ,

f s + 1 ) ,..., (

x e ,

f e ) }

t .

The weights w jr are positive and non-binary. They are obtained by the template

frequency model (TFM) discussed in Sect. 3.5 , Chap. 3 .

Since the template-frequency model considers all the visual contents occurring

in a video sequence (with the weight w jr ), this indexing technique can be applied to

characterize video sequences at different levels, from shot, group of shots, to story

levels. This allows for the system to facilitate the user's access to various levels

as depicted in Fig. 7.3 : (a) shot-to-shot, (b) shot-to-group, (c) group-to-group, (d)

group-to-story, and (e) shot-to-story.

This architecture is able to accommodate retrieval from the lower to higher levels,

e.g., retrieval of a video group or story by using a query from the shot or group

levels. A user is generally seeking information across the different levels defined in

the segmented videos. To satisfy this demand, it is expected that at a higher level,

the video story should contain most of the visual contents occurring at the lower

one. For instance, to retrieve a full news story, a small shot that contains the anchor

of the news story can be utilized as a query.

v j =[

w j 1 ,...,

w jr ,...,

w jR ]

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home