Database Reference
In-Depth Information
search system requires significant bandwidth, since at the server end, receiving
multiple query photos is much more challenging than receiving texts, within the
standard uplink bandwidth in a search engine.
A scalable near duplicate visual search system is typically developed based
on a Scalable Vocabulary Tree (SVT) [ 149 ]. For the large size database, SVT
reduces the computational cost as well as increases the performance in mobile
landmark recognition. Visual vocabulary models quantize the descriptors with K-
means clustering [ 150 ], vocabulary tree [ 149 ], and approximate K-means [ 151 ].
The BoW model with inverted indexing structure is usually developed for image
descriptors based on SVT [ 149 , 152 - 154 ]. However, a more compact descriptor
can be obtained by the Bag-of-features histogram, which encodes the position
differences of non-zero bins [ 155 ]. The inverted index structure of VST can also
be further compressed with arithmetic coding to reduce the memory and storage
cost to maintain a scalable visual search system.
While the aforementioned methods have been the main methods used for the
construction of SVT and the associated compact descriptor, this chapter presents
a method for improving the performance of SVT-based landmark recognition.
As in the previous works in [ 156 - 160 ], the goal is to study the discriminative
information of various image patches to evaluate the patch's importance. This
information constitutes a weighting scheme for construction of SVT and BoW
histogram features. In the conventional SVT and BoW, the local descriptors are
assigned equal importance and hence feature selection for visual word generation
is underutilized. However, the local descriptors for foreground landmarks should be
given more importance, while the local descriptors in the background are outlines
for recognition. The background information, such as sky and grass, are usually
common to many different landmark categories. As such, their importance should
be reduced when generating the BoW histogram [ 160 ].
In this chapter, the saliency map demonstrated in [ 161 , 346 ] is adopted for
construction of the saliency weighting scheme. Figure 5.1 shows the recognition
process incorporating saliency maps and re-ranking. The saliency weighting is
applied at various stages of the recognition process. Section 5.2 presents the
generation of the saliency map. This map is applied in the construction of local
descriptors in Sect. 5.3 , and the construction of the SVT codebook, BoW and
similarity function in Sect. 5.4 . In Sect. 5.5 , a new re-ranking procedure is applied
to select the important BoW features for improving the recognition accuracy. Sec-
tion 5.6 provides experimental results on landmark recognition from two landmark
databases.
5.2
Saliency Map Generation
The goal toward the generation of a saliency map is to ultimately highlight a handful
of 'significant' locations when the image is 'informative' according to the human
perception. The graph-based visual saliency (GBVS) method demonstrated in [ 161 ]
is applied to accomplish this. There are three stages for modeling visual saliency:
Search WWH ::




Custom Search