User-Perceptive Multimedia Content Analysis - User-centric Social Multimedia Computing

Information Technology Reference

In-Depth Information

simultaneously modeled the tag - tag , image - image and image - tag relationships, they

aggregated images' tags over all users, thereby losing important information about

individual user's variation in tag usage. In this chapter, we exploit the social aspect

of the photo sharing websites and consider user factor into the tag refinement prob-

lem. We believe that incorporation of user information will facilitate explaining the

tagging data and lead to better estimates of image and tag factors.

2.3 Methods for Social Image Tag Refinement

The low dimensional user , image and tag factor matrices can be viewed as compact

representations in the corresponding latent subspaces. The latent subspaces capture

the relevant attributes, e.g., the user dimensions are related to users' preferences or

social interests, the image dimensions indicate visual themes and the tag dimensions

are related to the semantic topics of tags. The basic intuition behind this work is: The

incorporation of user information will help extract more compact and informative

image and tag representations in the semantic subspaces. The task of image tag

refinement is then solved by computing the cross-space image - tag associations. In

this section we first introduce the idea of jointly modeling the user , image and tag

factors into a tensor factorization framework, then explain how to employ the derived

factors for tag refinement.

In the following, we denote tensors by calligraphic uppercase letters (e.g.,

matrices by uppercase letters (e.g., U

T ), vectors by bold lowercase letters (e.g.,

i ), scalars by lowercase letters (e.g., u

i ) and sets by blackboard bold letters (e.g.,

U , I , T

Tensor Factorization. There are three types of entities in the photo sharing web-

sites. The tagging data can be viewed as a set of triplets. Let

denote the sets of

users, images, tags and the set of observed tagging data is denoted by

U , I , T

O ↂ U×I×T

i.e., each triplet

means that user u has annotated image i with tag t .

The ternary interrelations can be viewed as a three-mode cube, where the modes

are the user , image and tag . Therefore, we can induce a three dimensional tensor

Y ∈ R | U |×| I |×| T | , which is defined as:

(

) ∈ O

0 otherwise

(

y u , i , t =

(2.1)

where

are the number of distinct users, images and tags respectively.

To jointly model the three factors of user , image and tag , we employ the general

tensor factorization model, Tucker Decomposition for the latent factor inference. In

Tucker Decomposition, the tagging data

|U|

|I|

|T|

are estimated by three low-rank matrices

and one core tensor:

Y := C × u U

× i I

× t T

(2.2)

User-centric Social Multimedia Computing

Search WWH ::

Custom Search

Home