Interactive Mobile Visual Search and Recommendation at Internet Scale - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

The quadkey method is adopted from the Bing Maps Tile System. 1 It converts

the GPS coordinates to a hashing-based representation for fast search and retrieval.

We present an example in Fig. 4.6 to walk through the steps of conversion from

the WGS-84 GPS to a quadruple tiles code. We encode the GPS to a 23 digits

number with the ground resolution of possible 0.02 m accuracy. The formulation

of this distance is computed by the Quadkeys representation. GPS context from

mobile sensor is collected first. The standard WGS-84 is encoded to the quadkey

representation. In the illustration, pictures of the same landmark (the Brussels town

hall) with both the front and the back façades are taken. These two photos have

different WGS-84 information, which have 10 out of 15 quadkey digits identical

after Bing Maps projection. In other words, the hamming distance between these

two codes is 5, which is calculated using tables to approximate a ground distance of

about 305 m.

This section uses a context-aware mobile visual search based on the BoW model

and the hierarchical visual vocabulary tree. Contextual GPS information is also used

in filtering the visual search result. In the next section, an implementation named

TapTell is presented based on the CVT algorithm introduced. TapTell is able to

achieve social activity recommendations through mobile visual searches.

4.3

Mobile Visual Search System for Social Activities Using

Query Image Contextual Model

TapTell is a system that utilizes visual query input through an advanced multi-

touch mobile platform and rich context to enable interactive visual search and

contextual recommendation. Different from other mobile visual searches, TapTell

explores users individual intent and their motivation in providing a visual query

with specified ROI. By understanding such intent, associated social activities can be

recommended to users. Existing work has predominantly focused on understanding

the intent expressed by text (or the text recognized from a piece of voice). For

example, previous research attempts to estimate user's search intent by detecting

meaningful entities from a textual query [ 131 , 140 ]. However, typing takes time

and can be cumbersome on the phone, and thus in some cases, not convenient in

expressing user intent. An alternative is to leverage speech recognition techniques

to support voice as an input. For example, popular mobile search engines enable

a voice-to-search mode. 2 , 3 Siri is one of the most popular applications that further

structure a piece of speech to a set of entities. 4

However, text as an expression of

2 http://www.discoverbing.com/mobile .

3 http://www.google.com/mobile .

4 http://siri.com/ .

Search WWH ::

Custom Search

Home