Information Technology Reference
In-Depth Information
higher distinctness [
20
]. In our implementation, each detected elliptical region
from MSER is normalized into a circular region, from which a SIFT descriptor is
computed. About 6MMSER descriptors are extracted from sampled 50,000 images,
which are further quantized to constitute a dictionary of 5,000 visual words. In the
following, wewill show the experimental results for topic-sensitive influencer mining
and personalized image search, respectively.
4.5.2 Topic-Sensitive Influencer Mining Evaluation
4.5.2.1 Topic Number
T
Selection
In topic modeling, the selection of topic number
T
is not trivial. We resort to the
perplexity measure in this chapter, which is a standard measure for estimating how
well one generative model fits the data. The lower the perplexity score is, the better
the performance. The perplexity of a set of tag-visual descriptor test pairs
(
w
d
,
v
d
)
,
for all
d
∈
D
test
images, is defined as the exponential of the negative normalized
predictive log-likelihood using the trained model. Formally,
exp
d
∈
D
test
ln
p
(
w
d
|
v
d
)
Perplexity
(D
test
)
=
−
d
∈
D
test
n
d
+
(4.17)
n
d
where
T
p
(
w
d
|
v
d
)
=
p
(
w
d
|
Z
t
)
p
(
Z
t
|
v
d
)
t
=
1
(
w
d
,
v
d
)
∈{
w
d
,
v
d
}
We first test the perplexity of model on a held-out set of 2,000 images for dif-
ferent settings of topic number
T
. The perplexity scores of the proposed mmTIM
over iterations are shown in Fig.
4.5
. We can see that the perplexity scores decreases
Fig. 4.5
The perplexities over iterations for different topic number
T