Mobile Landmark Recognition - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

search system requires significant bandwidth, since at the server end, receiving

multiple query photos is much more challenging than receiving texts, within the

standard uplink bandwidth in a search engine.

A scalable near duplicate visual search system is typically developed based

on a Scalable Vocabulary Tree (SVT) [ 149 ]. For the large size database, SVT

reduces the computational cost as well as increases the performance in mobile

landmark recognition. Visual vocabulary models quantize the descriptors with K-

means clustering [ 150 ], vocabulary tree [ 149 ], and approximate K-means [ 151 ].

The BoW model with inverted indexing structure is usually developed for image

descriptors based on SVT [ 149 , 152 - 154 ]. However, a more compact descriptor

can be obtained by the Bag-of-features histogram, which encodes the position

differences of non-zero bins [ 155 ]. The inverted index structure of VST can also

be further compressed with arithmetic coding to reduce the memory and storage

cost to maintain a scalable visual search system.

While the aforementioned methods have been the main methods used for the

construction of SVT and the associated compact descriptor, this chapter presents

a method for improving the performance of SVT-based landmark recognition.

As in the previous works in [ 156 - 160 ], the goal is to study the discriminative

information of various image patches to evaluate the patch's importance. This

information constitutes a weighting scheme for construction of SVT and BoW

histogram features. In the conventional SVT and BoW, the local descriptors are

assigned equal importance and hence feature selection for visual word generation

is underutilized. However, the local descriptors for foreground landmarks should be

given more importance, while the local descriptors in the background are outlines

for recognition. The background information, such as sky and grass, are usually

common to many different landmark categories. As such, their importance should

be reduced when generating the BoW histogram [ 160 ].

In this chapter, the saliency map demonstrated in [ 161 , 346 ] is adopted for

construction of the saliency weighting scheme. Figure 5.1 shows the recognition

process incorporating saliency maps and re-ranking. The saliency weighting is

applied at various stages of the recognition process. Section 5.2 presents the

generation of the saliency map. This map is applied in the construction of local

descriptors in Sect. 5.3 , and the construction of the SVT codebook, BoW and

similarity function in Sect. 5.4 . In Sect. 5.5 , a new re-ranking procedure is applied

to select the important BoW features for improving the recognition accuracy. Sec-

tion 5.6 provides experimental results on landmark recognition from two landmark

databases.

5.2

Saliency Map Generation

The goal toward the generation of a saliency map is to ultimately highlight a handful

of 'significant' locations when the image is 'informative' according to the human

perception. The graph-based visual saliency (GBVS) method demonstrated in [ 161 ]

is applied to accomplish this. There are three stages for modeling visual saliency:

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home