Database Reference
In-Depth Information
From a retrieval point of view, although the BoW model has shown promising
results in desktop-based visual searches for large-scale consortia, it also suffers a
semantic gap. The BoW model is limited by its homogenous process in treating
all regions without distinction. Features are extracted homogeneously, and local
features are treated without emphasis. Therefore, information provided by a query
image without priority can mislead the computer vision algorithm for recognition.
Hence, to have a better retrieval result, there is a need to orderly utilize local
visual information. Multi-touch screen and its user interaction on mobile-devices
offer such a platform for users to select their ROIs as prioritized information, with
surrounding context as secondary information.
From a mobile application perspective, visual search via image query provides
a powerful complementary carrier besides conventional textual and vocal queries.
Compared to conventional text or voice queries for information retrieval on-the-go,
there are many cases where visual queries can be more naturally and conveniently
expressed via mobile device camera sensors (such as an unknown object or text,
an artwork, a shape or texture, and so on) [ 135 ]. In addition, mobile visual
search has a promising future due to the vital roles mobile devices play in our
life, from their original function of telephony, to prevalent information-sharing
terminals, to hubs that accommodate tens of thousands of applications. While on
the go, people are using their phones as a personal concierge discovering what
is around and deciding what to do. Therefore, the mobile phone is becoming a
recommendation terminal customized for individuals—capable of recommending
contextually relevant entities (local businesses such as a nearby restaurant or hotel)
and simplifying the accomplishment of recommended tasks. As a result, it is
important to understand user intent through its multi-modal nature and the rich
context available on the phone.
Motivated by the above observations, this chapter presents an interactive search-
based visual recognition and contextual recommendation using the BoW model,
targeting internet scale large image collection. Smart-phone hardware such as
camera and touch screen, are taken advantage of in order to facilitate expressions of
user's ROI from the pictures taken. Then, the visual query along with such a ROI
specification go through an innovative contextual visual retrieval model to achieve
a meaningful connection to database images and their associated rich text informa-
tion. Once the visual recognition is accomplished, associated textual information of
retrieved images are further analyzed to provide meaningful recommendations.
An actual system codename TapTell is implemented based on the algorithms
and methodologies described in Sect. 4.2 . A natural user interaction is adopted
to achieve the Ta p action, in which three gestures are investigated (i.e., circle,
line, and tap). It is concluded that the circle (also called “O” gesture) is the most
natural interaction for users, which integrates user preference to select the targeted
object. The BoW model and a novel context-embedded vocabulary tree approach
is adopted. The algorithm incorporates both ROI visual query and the context from
surrounding pixels of the “O” region to search similar images from a large-scale
image dataset. Through this user interaction (i.e., “O” gesture) and the BoW model
with our innovative algorithm, standard visual recognition can be improved. The Te l l
Search WWH ::




Custom Search