Database Reference
In-Depth Information
Table 9.3 Comparison of view classification techniques in literature, emphasizing on features
utilization and classification methods
Global features
Local-
feature
based
Year
Reference
2001 [ 243 ] Soccer
Nature
of data
Color
based
Texture
based
Others
(yes/innov)
View classification method
Ye s
No
No
No
Thresholding ( S )
2002 [ 244 ] Soccer
Ye s
No
Ye s
No
Morphological operations ( S )
2003 [ 282 ] Four
sports
Ye s
Ye s
Innov
No
Decision tree ( S )
2004 [ 283 ] Soccer
Ye s
Ye s
Ye s
No
Decision tree ( S )
2007 [ 280 ] Four
sports
Ye s
Ye s
Ye s
No
Spectral clustering ( UnS )
2008 [ 286 ] Soccer
Ye s
Ye s
Ye s
No
Neural-network ( S )
2008 [ 281 ] Three
sports
Ye s
No
No
No
Spectral-division
algorithm
( UnS )
2009 [ 285 ] Soccer Ye s No Ye s No Decision tree ( S )
In the “Global Features” column with “Others (yes/innov)” category: “yes” means other than
color and texture global features are used while not innovative, while “Innov” means newly
designed features are used. For the “View Classification Method” column, S indicates an
supervised method, while UnS indicates the unsupervised method
PLSA relies on the likelihood function of multinomial sampling and aims to
reach an explicit maximization of the predictive power of the model. Incorporating
the PLSA plate notation in Fig. 9.3 with the view classification application, the
observed state w is defined as codewords with a predefined codebook of size M .
An individual video frame is denoted by d with a total number of training frames N .
Latent state z is the view type and parameter K is the total number of view classes,
and in this work, K equals four. The likelihood function is given in Eq. ( 9.1 ). The
probabilistic distribution is defined as p
, where w i is an individual codeword,
and d j is a training frame. Such distribution can be represented by a sum-of-product
of two distributions, p
(
w i |
d j )
. The former is interpreted as an impact
on codewords by a view type, while the latter is the probability of a particular view
type given a training frame. The number of codeword w i appearing in a frame d j
is denoted as n
(
w i |
z k )
and p
(
z k |
d j )
. The argument of maximum posterior (MAP) estimate z is
optimized by using an expectation maximization (EM) as shown in Eq. ( 9.2 ).
(
w i ,
d j )
M
i = 1
N
j = 1 p ( w i | d j )
n ( w i , d j )
L
=
M
i = 1
j = 1 K
N
k = 1 p ( w i | z k ) p ( z k | d j ) n ( w i , d j )
=
(9.1)
z =
arg max
z
p
(
z
|
d
)
(9.2)
 
Search WWH ::




Custom Search