Information Technology Reference
In-Depth Information
model does not fully explore the image-image intrarelations. Both TRVSC and M-E
Graph suffer from the high computation problem and the performances are limited
on large-scale applications. As their methods are difficult to implement, the results
of TRVSC and M-E Graph are taken from [ 23 ], which conducted tag refinement on a
selected subset of NUS-WIDE. Their results on the whole NUS-WIDE dataset tend
to decrease. Using factor analysis methods, MPMF and LR perform well on sparse
dataset, which coincides with the authors' demonstration. For different settings of
the proposed approach, RMTF, and MTF_0/1 are superior than other algorithms,
showing the advantage of incorporating user information. Interpreting the tagging
data based on the proposed ranking scheme instead of the conventional 0/1 scheme ,
RMTF is generally better than MTF_0/1. Without smoothness priors, TF_0/1 fails
to preserve the affinity structures and achieves inferior results.
We note that TF_rank follows the same spirits as Rendle's works [ 31 , 32 ] and was
implemented to perform performance comparison with the proposed RMTF method.
Consistent with the discussion in Sect. 2.3.2 that Rendle's works cannot fully account
for the issues of missing tags and noisy tags, TF_rank obtains less improvement than
the proposed RMTF. Actually, without consideration on the utilization of smoothness
constraints, TF_rank is even inferior to MTF_0/1. In addition, according to the neg-
ative set selection strategy of TF_rank, the optimization algorithm needs to consider
redundant pairs of training samples. It turns out that generally TF_rank achieves
slower convergence speed than MTF_0/1 and RMTF.
The detailed performances for a representative subset of the 81 tags are provided in
Fig. 2.5 . We see that, for simple concepts like “airport,” “beach,” “bear,” and “birds,”
our methods achieve a comparable, if not worse performance with the baselines. The
reason is that images containing these concepts describe feasible and tangible objects,
where image understanding can be effectively conducted by propagating visual sim-
ilarities and only exploiting the image - tag relations. While, for more abstract and
complex concepts like “cityscape,” “earthquake,” “military,” and “protest,” existing
methods focusing on utilizing image appearances and tag semantics fail and our
Fig. 2.5 F-score of a subset of the 81 tags for different algorithms. ©[2012] IEEE. Reprinted, with
permission, from Ref. [ 34 ]
Search WWH ::




Custom Search