Database Reference
In-Depth Information
8.5.1
System Architecture
The system adopts the two stage search approach discussed in Fig. 8.10 a, b as a
means for discovering and interacting with other peers. Each node is assumed to
have a local database containing video clips, each of which is indexed with a vector.
Figure 8.17 shows the connection of peers after the social network has been formed.
The search is started by the user node to discover its neighbors. The user node sends
a packet containing query vector v q to the other nodes in the list though Java socket
programing. The retrieval process is conducted according to the sequence diagram
illustrated in Fig. 8.18 .
In the diagram, once received, the query vector is used to search through similar
video files in the peer nodes locally. Retrieval results are used to modify the query
vector automatically, and the modified query, v q , i ,
n is routed back to
the user node, where n is the total number of nodes. Consequently, the user node
gathers all modified query vectors and uses them to adjust the components of the
previous query vector. All steps are repeated with the new query vector, v q .After
several rounds of forward and backward signal propagation between the nodes, the
improved retrieval results from each peer nodes are delivered to the user node.
Figure 8.19 shows a snapshot of the retrieval process. The query video clips and
the list of peers are displayed on left panel. The retrieved video clips shown on
the right panel are represented by the key frames. The retrieval results after each
iteration of query modification are also available for the users.
i
=
1
,
2
,...,
8.5.2
Video Indexing on the P2P Network
A video file can be segmented into video clips, each of which may contain more than
one shot. The TFM technique discussed in Chap. 3 is employed for indexing of the
video clips. The descriptor of a video clip is denoted by VD
= {
x 1 ,...,
x i ,...,
x N }
,
P is the visual descriptor of the corresponding i -th frame, and N
is the total number of frames. The TFM utilizes vector quantization to assign
each video frame to the best matched visual template. A set of visual templates,
C
where x i R
J , is generated by competitive learning (as explained in
Table 3.7 ), where g j R
= g j |
j
=
1
,
2
,...,
P is the j -th visual template and J is the total number of
templates. The mapping of the i -th frame is given by the labeling of its feature
vector, i.e.,
l x i
j ,
l x i
j ,
l x i
j , ʷ
x i
,
,...,
(8.15)
1
2
arg mi j x i
g j
l x i
j =
(8.16)
where l x i
l x i
and l x i
j , ʷ
j , 1 ,
j , 2 ,
are the labels of the
ʷ
best matching templates.
By mapping all x i ,
i
=
1
,...,
N , in the input video clip, the resulting labels,
Search WWH ::




Custom Search