Database Reference
In-Depth Information
In on-line image searches, given a query image, we can interpret the descriptor
vectors of the image in a similar way to the indexing procedure, and accumulate
scores for the images in the database with a so-called
term frequency-inverse
document frequency
(tf-idf) scheme [
126
]. This tf-idf method is an effective entropy
weighting for indexing a scalable database. Figure
4.4
shows the computation of
image similarity based on the tf-idf scheme. In the vocabulary tree, each leaf node
corresponds to a visualword
i
, associated with an inverted file (with the list of images
containing this visualword
i
). Note that we only need to consider images
d
in the
database with the same visualwords as the query image
q
. This significantly reduces
the amount of images to be compared with respect to
q
. The similarity between an
image
d
and the query
q
is given by
2
2
s
(
q
,
d
)=
q
−
d
2
∑
2
+
∑
i
2
∑
=
0
|
q
i
|
0
|
d
i
|
+
0
|
q
i
−
d
i
|
(4.2)
i
|
d
i
=
|
q
i
=
i
|
q
i
=
0
,
d
i
=
where
q
and
d
denote the tf-idf feature vectors of the query
q
and image
d
in
the database, which are consisted of individual elements
q
i
and
d
i
(
i
denotes the
i
-th visualword in the vocabulary tree), respectively.
q
i
and
d
i
are the tf-idf value
for the
i
-th visualword in the query and the image, respectively. Mathematical
interpretations are given by
q
i
=
tf
i
q
·
id f
i
,
(4.3)
d
i
=
tf
i
d
·
id f
i
.
(4.4)
In the above equation, the
inverted document frequency id f
i
is formulated as
ln
, where
N
is the total number of images in the database, and
N
i
is number
of images with the visualword
i
(i.e., the images whose descriptors are classified
into the leaf node
i
).
The
term frequency
representations
tf
i
q
and
tf
i
d
are computed as the accumulated
counts of the visualword
i
in the query
q
and the database image
d
, respectively.
One simple means for the
term frequency
computation is to use the O-query as
the initial query without considering the pixels surrounding the “O”. This process
is equivalent to using “binary” weights of the
term frequency t f
i
q
: the weight is 1
inside “O”, and 0 outside “O”. A more descriptive and accurate computation is to
incorporate the context information (i.e., the surrounding pixels around the O-query)
in the vocabulary tree. We design a new representation of the
term frequency t f
i
q
for
the O-query. A “soft” weighting scheme is adopted to modulate the
term frequency
by incorporating the image context outside the O-query, which was neglected in
the simple binary scheme. When quantizing descriptors in the CVT, the
tf
i
q
(
N
/
N
i
)
of the
O-query for a particular query visualword
i
q
is formulated as:
Search WWH ::
Custom Search