Information Technology Reference
In-Depth Information
higher distinctness [ 20 ]. In our implementation, each detected elliptical region
from MSER is normalized into a circular region, from which a SIFT descriptor is
computed. About 6MMSER descriptors are extracted from sampled 50,000 images,
which are further quantized to constitute a dictionary of 5,000 visual words. In the
following, wewill show the experimental results for topic-sensitive influencer mining
and personalized image search, respectively.
4.5.2 Topic-Sensitive Influencer Mining Evaluation
4.5.2.1 Topic Number T Selection
In topic modeling, the selection of topic number T is not trivial. We resort to the
perplexity measure in this chapter, which is a standard measure for estimating how
well one generative model fits the data. The lower the perplexity score is, the better
the performance. The perplexity of a set of tag-visual descriptor test pairs
(
w d ,
v d )
,
for all d
D test images, is defined as the exponential of the negative normalized
predictive log-likelihood using the trained model. Formally,
exp
d D test
ln p
(
w d |
v d )
Perplexity
(D test ) =
d D test n d +
(4.17)
n d
where
T
p
(
w d |
v d ) =
p
(
w d |
Z t )
p
(
Z t |
v d )
t =
1
(
w d
,
v d
) ∈{
w d
,
v d
}
We first test the perplexity of model on a held-out set of 2,000 images for dif-
ferent settings of topic number T . The perplexity scores of the proposed mmTIM
over iterations are shown in Fig. 4.5 . We can see that the perplexity scores decreases
Fig. 4.5 The perplexities over iterations for different topic number T
 
Search WWH ::




Custom Search