Interactive Mobile Visual Search and Recommendation at Internet Scale - Multimedia Database Retrieval: Technology and Applications

Database Reference

In-Depth Information

In on-line image searches, given a query image, we can interpret the descriptor

vectors of the image in a similar way to the indexing procedure, and accumulate

scores for the images in the database with a so-called term frequency-inverse

document frequency (tf-idf) scheme [ 126 ]. This tf-idf method is an effective entropy

weighting for indexing a scalable database. Figure 4.4 shows the computation of

image similarity based on the tf-idf scheme. In the vocabulary tree, each leaf node

corresponds to a visualword i , associated with an inverted file (with the list of images

containing this visualword i ). Note that we only need to consider images d in the

database with the same visualwords as the query image q . This significantly reduces

the amount of images to be compared with respect to q . The similarity between an

image d and the query q is given by

(

−

∑

+ ∑

∑

0 |

q i |

0 |

d i |

0 |

q i −

d i |

(4.2)

d i =

q i =

d i =

where q and d denote the tf-idf feature vectors of the query q and image d in

the database, which are consisted of individual elements q i and d i ( i denotes the

i -th visualword in the vocabulary tree), respectively. q i and d i are the tf-idf value

for the i -th visualword in the query and the image, respectively. Mathematical

interpretations are given by

q i =

tf i q ·

id f i ,

(4.3)

d i =

tf i d ·

id f i .

(4.4)

In the above equation, the inverted document frequency id f i is formulated as

, where N is the total number of images in the database, and N i is number

of images with the visualword i (i.e., the images whose descriptors are classified

into the leaf node i ).

The term frequency representations tf i q and tf i d are computed as the accumulated

counts of the visualword i in the query q and the database image d , respectively.

One simple means for the term frequency computation is to use the O-query as

the initial query without considering the pixels surrounding the “O”. This process

is equivalent to using “binary” weights of the term frequency t f i q : the weight is 1

inside “O”, and 0 outside “O”. A more descriptive and accurate computation is to

incorporate the context information (i.e., the surrounding pixels around the O-query)

in the vocabulary tree. We design a new representation of the term frequency t f i q for

the O-query. A “soft” weighting scheme is adopted to modulate the term frequency

by incorporating the image context outside the O-query, which was neglected in

the simple binary scheme. When quantizing descriptors in the CVT, the tf i q

(

N i )

of the

O-query for a particular query visualword i q is formulated as:

Multimedia Database Retrieval: Technology and Applications

Search WWH ::

Custom Search

Home