Scalable Video Genre Classification and Event Detection - Multimedia Database Retrieval: Technology and Applications - page 257

Database Reference

In-Depth Information

Table 9.3 Comparison of view classification techniques in literature, emphasizing on features

utilization and classification methods

Global features

Local-

feature

based

Year

Reference

2001 [ 243 ] Soccer

Nature

of data

Color

based

Texture

based

Others

(yes/innov)

View classification method

Ye s

No

No

No

Thresholding ( S )

2002 [ 244 ] Soccer

Ye s

No

Ye s

No

Morphological operations ( S )

2003 [ 282 ] Four

sports

Ye s

Ye s

Innov

No

Decision tree ( S )

2004 [ 283 ] Soccer

Ye s

Ye s

Ye s

No

Decision tree ( S )

2007 [ 280 ] Four

sports

Ye s

Ye s

Ye s

No

Spectral clustering ( UnS )

2008 [ 286 ] Soccer

Ye s

Ye s

Ye s

No

Neural-network ( S )

2008 [ 281 ] Three

sports

Ye s

No

No

No

Spectral-division

algorithm

( UnS )

2009 [ 285 ] Soccer Ye s No Ye s No Decision tree ( S )

In the “Global Features” column with “Others (yes/innov)” category: “yes” means other than

color and texture global features are used while not innovative, while “Innov” means newly

designed features are used. For the “View Classification Method” column, S indicates an

supervised method, while UnS indicates the unsupervised method

PLSA relies on the likelihood function of multinomial sampling and aims to

reach an explicit maximization of the predictive power of the model. Incorporating

the PLSA plate notation in Fig. 9.3 with the view classification application, the

observed state w is defined as codewords with a predefined codebook of size M .

An individual video frame is denoted by d with a total number of training frames N .

Latent state z is the view type and parameter K is the total number of view classes,

and in this work, K equals four. The likelihood function is given in Eq. ( 9.1 ). The

probabilistic distribution is defined as p

, where w i is an individual codeword,

and d j is a training frame. Such distribution can be represented by a sum-of-product

of two distributions, p

(

w i |

d j )

. The former is interpreted as an impact

on codewords by a view type, while the latter is the probability of a particular view

type given a training frame. The number of codeword w i appearing in a frame d j

is denoted as n

(

w i |

z k )

and p

(

z k |

d j )

. The argument of maximum posterior (MAP) estimate z ∗ is

optimized by using an expectation maximization (EM) as shown in Eq. ( 9.2 ).

(

w i ,

d j )

M

i = 1

N

j = 1 p ( w i | d j )

n ( w i , d j )

L

=

M

i = 1

j = 1 K

N

k = 1 p ( w i | z k ) p ( z k | d j ) n ( w i , d j )

=

(9.1)

z ∗ =

arg max

z

p

(

z

|

d

)

(9.2)

Next Page

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home