Database Reference
In-Depth Information
Thus, the video interval
V
can be described by a set of descriptors
D V =
{ (
. The indexing of each video frame is obtained
by a vector quantization process. Specifically, let
x 1 ,
f 1 ) ,···, (
x m ,
f m ) ,···, (
x M ,
f M ) }
P
C = { (
g r ,
l r ) |
g r R
,
l r
be a set of templates (or codewords) g r , where l r
is the label of the r -th template. This set is previously generated and optimized by
a competitive learning algorithm [ 331 ] (illustrated in Table 3.7 ). The vector x m is
mapped by
{
1
,
2
,...,
R
},
r
=
1
,
2
,...,
R
}
P
R
ₒ C
on the Voronoi space, i.e., quantizing the input vector by:
l x m
l x m
r ,
l x m
r , ( ʷ
Q
(
x m )= {
r ,
,...,
) }
(3.30)
1
1
where Q is the vector quantization function, l x m
is the label of the closest
r
template, i.e.,
r =
arg min
r
( x m
g r )
(3.31)
and l x m
1 and l x m
are the labels for the first and the last neighbors of the wining
template g r , respectively.
Equation ( 3.30 ) obtains a multiple label indexing that is designed to describe
correlation information between the winning template and its neighbors. Figure 3.12
shows the example of this indexing process. Here we are interested not only in the
best-match template, but also the second (and up to
r ,
r , ( ʷ 1 )
ʷ
-th) best match. Once a cell is
selected, the
1 neighbors which have not yet been visited in the scan are then
also included in the output label set. This allows for interpretation of the correlation
information between the selected cell and its neighbors. Since a video sequence
usually has a very strong frame-to-frame correlation [ 102 ] due to the nature of time-
sequence data, embedding correlation information through Eq. ( 3.30 )offersabetter
description for video contents, and thus a means for more accurate discriminant
analysis. For example, two consecutive frames which are visually similar may not
be mapped into the same cell; rather, they may be mapped onto two cells in a
neighborhood area, so that mapping through multiple labels using Eq. ( 3.30 ) maps
two frames from the same class in the visual space into the same neighborhood area
in feature space.
The visual content of the video frame
ʷ
f m is therefore characterized by the
l x m
l x m
l x m
r , ( ʷ
membership of the label set,
{
r ,
,...,
) }
. The result of mapping all
r ,
1
1
frames, l x m
l x m
l x m
r , ( ʷ
from the mapping of the entire
video interval V j are concatenated into a vector v j = w j 1 ,...,
r ,
,...,
, ∀
m
∈{
1
,...,
M
}
r ,
1
1
)
w jR .The
w jr ,...,
weight parameters are calculated by the TF
×
IDF weight scheme [ 323 ]:
F jr
max r F jr ×
log N
n r
w jr =
(3.32)
where the weight parameter F jr stands for a raw frequency of template g r in the
video interval
V j , i.e.,
 
Search WWH ::




Custom Search