Interactive Mobile Visual Search and Recommendation at Internet Scale - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

Fig. 4.1

Illustration of bag-of-words framework in computer vision

4.2.1

The Bag-of-Word (BoW) Model

Figure 4.1 shows a framework of the BoW model and its usage in computer vision.

In general, there are two parts: learning and recognition. In learning, visual features

are extracted from database images or video frames to generate a dictionary of

codewords, which is also called a codebook in the literature. Individual images are

used to project their features to the codebook to obtain a BoW representation for

themselves. They are then categorized by classifiers to get ready for recognition. In

recognition, a query or testing image also goes through the BoW model by mapping

to the dictionary of codewords. Then, the BoW representation is categorized based

on which class the query image belongs to.

To tackle the multimedia processing challenges associated with recent boom

of large-scale data, the BoW model is among the most popular choices in the

research community. Because of their homogenous procedures in describing images

or video frames using representative local features, BoW-based methods enable

researchers to conduct large-scale image analysis effectively. Large-scale image

classification and retrieval have been carefully studied in recent years to catch up

with the ever growing image and video datasets. Image classification and retrieval

are highly interrelated research problems. Both of them are based on analyzing

distinguished features of the query image, and are in attempts to bring out similar

images from the database. Classification focuses on the intra-class commonalities so

that the query image can find its suitable class and belonging. Retrieval, on the other

hand, focuses on finding the most closely related individual images in the database

Search WWH ::

Custom Search

Home