Database Reference
In-Depth Information
Fig. 4.1
Illustration of bag-of-words framework in computer vision
4.2.1
The Bag-of-Word (BoW) Model
Figure 4.1 shows a framework of the BoW model and its usage in computer vision.
In general, there are two parts: learning and recognition. In learning, visual features
are extracted from database images or video frames to generate a dictionary of
codewords, which is also called a codebook in the literature. Individual images are
used to project their features to the codebook to obtain a BoW representation for
themselves. They are then categorized by classifiers to get ready for recognition. In
recognition, a query or testing image also goes through the BoW model by mapping
to the dictionary of codewords. Then, the BoW representation is categorized based
on which class the query image belongs to.
To tackle the multimedia processing challenges associated with recent boom
of large-scale data, the BoW model is among the most popular choices in the
research community. Because of their homogenous procedures in describing images
or video frames using representative local features, BoW-based methods enable
researchers to conduct large-scale image analysis effectively. Large-scale image
classification and retrieval have been carefully studied in recent years to catch up
with the ever growing image and video datasets. Image classification and retrieval
are highly interrelated research problems. Both of them are based on analyzing
distinguished features of the query image, and are in attempts to bring out similar
images from the database. Classification focuses on the intra-class commonalities so
that the query image can find its suitable class and belonging. Retrieval, on the other
hand, focuses on finding the most closely related individual images in the database
Search WWH ::




Custom Search