User-Perceptive Multimedia Content Analysis - User-centric Social Multimedia Computing

Information Technology Reference

In-Depth Information

problems, we present a ranking optimization scheme which intuitively considers the

user tagging behaviors and addresses the issues of missing tags and noisy tags.

We note that only the qualitative difference is important and fitting to the numer-

ical values of 1 and 0 is unnecessary. Therefore, instead of solving an point-wise

classification task, we formulate it as a ranking problem which uses tag pairs within

each user-image combination

(

)

as the training data and optimizes for correct

t + )>

t − )

indicates that user u considers tag t +

ranking. For example, y

(

is better to describe image i than tag t − .

We provide some notations for easy explanation. Each user-image combination

(

)

is defined as a post . The set of observed posts is denoted as

P O

P O = (

) |∃

∈ T ,

y u , i , t =

(2.7)

The neutral triplets constitute a set

M = (

) ∈ P O

) | (

(2.8)

It is arbitrary to treat the neutral triplets as either positive or negative and we remove

all the triplets in

from the learning process (filled by bold question marks in

Fig. 2.2 b).

For the training pair determination, we consider two characteristics of the user

tagging behaviors. On one hand, some concepts may be missing in the user-generated

tags. We assume that the tags co-occurring frequently are likely to appear in the same

image (we call it context - relevant ). On the other hand, users will not bother to use all

the relevant tags to describe the image. The tags semantic - relevant with the observed

tags are also the potential good descriptions for the image. The two assumptions are

reasonable. Looking at the running example, user1 annotated image1 with tag3 (we

assume tag3 is to describeNemo, e.g., tag3

“fish”).We can see that the tags “water,”

“sea,” and “coral” which are context-relevant and “animal”, “seafish,” “clownfish”

which are semantic-relevant with the tag “fish” are all good descriptions for image1 .

To perform the idea, we build a tag-affinity graph W T based on tag semantic and

context intrarelations. 5 The tags with the k -highest affinity values are considered

semantic-relevant or context-relevant.

Regarding the possible noises in the user-generated tags, it is risky to enrich

the semantic- or context-relevant tags into the positive set. Therefore, we choose a

conservative strategy: we keep the unobserved tags semantic- irrelevant and context-

irrelevant with any of the observed tags, to form the negative tag set. Note that the

ranking optimization is performed over each post andwithin each post

(

)

a positive

T u , i and a negative tag set

T u , i are desired to construct the training pairs. Given

tag set

a post

, the observed tags constitute a positive tag set (the corresponding

triplets are filled by plus signs in Fig. 2.2 b):

T u , i = t

(

) ∈ P O

| (

) ∈ P O ∧

y u , i , t =

(2.9)

5 Detail of W T construction is introduced in next subsection.

User-centric Social Multimedia Computing

Search WWH ::

Custom Search

Home