Database Reference
In-Depth Information
8.3.3
Query Processing on the P2P Network
8.3.3.1
Query Indexing
In the query processing stage (cf. Fig. 8.2 b), the query feature vector is mapped
into the SOTM, similarly as in the indexing stage. However, the multiple-cluster
mapping method is employed here to improve retrieval accuracy. Figure 8.4
illustrates a simplified example of dividing data samples (represented in dots) into
six clusters, where the cluster centers are denoted by crossed-circles. If the query
image (represented as a triangle) is located close to the boundary between multiple
clusters, retrieval precision drops if the search is done for only the closest cluster. As
shown in Fig. 8.4 , relevant images may reside under multiple clusters (such as the
samples covered by the circle surrounding the query). Therefore, if the query image
is located at equal or nearly equal distance to multiple cluster centers, multiple
clusters should be chosen to avoid a significant degradation in the retrieval precision.
When a node issues a query, it converts the query image to a set of cluster IDs,
and then sends the query message to the nodes responsible for the cluster IDs. In
order to obtain the cluster IDs for the query, a multiple-cluster mapping denoted
by:
d
Z k
R
is employed. This is done by vector quantization of the query feature
vector, f q :
Q
(
f q )= {
clusID 1 ,
clusID 2 ,...,
clusID k }
(8.5)
where clusID 1 is the closest (winning) cluster to the query, i.e.,
arg mi i f q
w i
clusID 1
(8.6)
and clusID 2 and clusID k are respectively the cluster ID of the second and the k -th
neighbor of the winning cluster.
Afterwards,
the
node
sends
the
query
messages
in
the
form
of
<
key i ,
IP address
>
, where i
=
1
,
2
,...,
k , and key i is obtained by:
key i =
SHA1
(
clusID i ) ,
i
=
1
,
2
,...,
k
(8.7)
Using the query routing scheme in the Chord, each query message is only forwarded
to the node responsible for the keys. The number of query messages depends on the
parameter k . Increasing k also expands the number of clusters that are close to the
query, and thus, increase the probabilities of finding the relevant images.
Figure 8.5 shows the precision results as a function of k , the number of clusters
selected for an indexing query in Eq. ( 8.5 ). The database was the Corel database
[ 224 ], containing 40,000 images. Each image was characterized by color histogram,
color moment, Gabor wavelet for texture, and Fourier descriptor, as discussed in
Chap. 2 (Table 2.7 ). The application of SOTM resulted in 233 clusters (the number
of clusters was automatically generated) [ 359 ]. The figure shows the average
Search WWH ::




Custom Search