Interactive Mobile Visual Search and Recommendation at Internet Scale - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Fig. 4.8 The framework of TapTell , based on previously introduced visual recognition algorithm

in Fig. 4.2 , incorporates with the visual intents notation

Intent expression recognizes the object specified by the user-mobile interaction.

Intent prediction formulates intent expression and incorporates image context.

Finally, a task recommendation is achieved by taking both the predicted intent, as

well as, the sensory context.

In the following, Sect. 4.3.2 presents a conducted survey and explains why the

“O” gesture is chosen as the best solution among several gesture candidates. With

the “O” gesture and selected ROI, visual recognition by search is achieved using the

algorithm introduced in the previous section. Consequently, Sect. 4.3.3 describes the

recommendation, using text metadata associated with visual recognition to achieve

a better re-ranking.

4.3.2

User Interaction for Specifying Visual Intent

It has been studied and suggested that visual interface will improve mobile search

experiences [ 114 ]. In this section, a user study is conducted to identify the most

natural and efficient gesture for specifying the visual intent on mobile devices. By

taking advantages of multi-touch interaction on smart-phones, three gestures for

specifying visual intents on captured photos are defined as follows:

Tap. A user can “tap” on the pre-determined image segments, in which a

captured image is automatically segmented on-the-fly. Then, the tapped segments

Search WWH ::

Custom Search

Home