Database Reference
In-Depth Information
Fig. 4.8 The framework of TapTell , based on previously introduced visual recognition algorithm
in Fig. 4.2 , incorporates with the visual intents notation
Intent expression recognizes the object specified by the user-mobile interaction.
Intent prediction formulates intent expression and incorporates image context.
Finally, a task recommendation is achieved by taking both the predicted intent, as
well as, the sensory context.
In the following, Sect. 4.3.2 presents a conducted survey and explains why the
“O” gesture is chosen as the best solution among several gesture candidates. With
the “O” gesture and selected ROI, visual recognition by search is achieved using the
algorithm introduced in the previous section. Consequently, Sect. 4.3.3 describes the
recommendation, using text metadata associated with visual recognition to achieve
a better re-ranking.
4.3.2
User Interaction for Specifying Visual Intent
It has been studied and suggested that visual interface will improve mobile search
experiences [ 114 ]. In this section, a user study is conducted to identify the most
natural and efficient gesture for specifying the visual intent on mobile devices. By
taking advantages of multi-touch interaction on smart-phones, three gestures for
specifying visual intents on captured photos are defined as follows:
￿
Tap. A user can “tap” on the pre-determined image segments, in which a
captured image is automatically segmented on-the-fly. Then, the tapped segments
Search WWH ::




Custom Search