Interactive Mobile Visual Search and Recommendation at Internet Scale - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

or the description of the dish will be analyzed. For example, optical character

recognition (OCR) can help you automatically recognize the indicated text, while

a visual search can help you identify the dish (which may not be recognized

without indication) and recommend nearby restaurants serving a similar dish.

Figure 4.7 shows three corresponding scenarios. The visual intent model consists

of two parts: visual recognition by search and social task recommendation. The first

problem is to recognize what is captured (e.g., a food image), while the second is

to recommend related entities (such as nearby restaurants serving the same food)

based on the search-based recognition results. This activity recommendation is

a difficult task in general, since visual recognition in the first step still remains

challenging. However, the advanced functionalities, such as natural multi-touch

interaction and a set of available rich context on the mobile device, bring us

opportunities to accomplish this task. For example, although one image usually

contains multiple objects, a user can indicate an object or some text of interest

through a natural gesture, so that visual recognition can be reduced to search a

similar single object. Moreover, the contextual information, such as geo-location,

can be used for location-based recommendations.

Since the visual intent is an original term, this chapter retrospects the evolution

of intent in general and walk the readers through the formation of the intent from

text, voice, and visual inputs, with both desktop-based and mobile domain-based

searches and recognition.

For desktop user intent mining, an early study on web search taxonomy is

introduced by Broder [ 110 ]. In this work, the most searched items belong to an

“informational” category, in which it sought for related information to answer

certain questions in a user's mind. A later work from Rose and Levinson further

categorized the informational class to five sub-categories, where the locate of a

product or service occupies a large percentage [ 133 ]. On the other hand, compared

to general web searches, intents derived from mobile information have strong on-

the-go characteristics. Church and Smyth conducted a diary study of user behavior

of mobile-based text search and summarized a quite different categorization from its

general web search counterpart [ 113 ]. Besides the informational category at 58

3%,

a new geographical category which is highly location dependent takes a share of

31

.

1 % of total search traffic. From a topic perspective, local services and travel &

commuting are the most popular ones out of 17 total topics, with 24

.

2%

entries respectively. It can be concluded that the on-the-go characteristics play an

important role for intent discovery and understanding on mobile devices [ 143 ].

.

2 % and 20

.

4.3.1

System Architecture

Figure 4.8 shows the architecture of TapTell . It extends Fig. 4.2 by including user

intent. This illustration can assist readers from an implementation perspective to

understand the importance in linking individual intents to final recommendations.

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home