Information Technology Reference
In-Depth Information
×
n
is the tensor product of multiplying a matrix on mode
n
. Each low-rank
matrix (
U
where
∈ R
|U|×
r
U
,
I
∈ R
|I|×
r
I
,
T
∈ R
|T|×
r
T
) corresponds to one factor. The
r
T
contains the interactions between the different factors.
The ranks of decomposed factors are denoted by
r
U
,
r
U
×
r
I
×
core tensor
C
∈ R
r
I
,
r
T
and Eq. (
2.2
)iscalled
rank-
Tucker decomposition. An intuitive interpretation of Eq. (
2.2
)is
that the tagging data depends not only on how similar an image's visual features and
tag's semantics are, but also on how much these features/semantics match with the
users' preferences.
Typically, the latent factors
U
,
I
,
T
can be inferred by directly approximating
(
r
U
,
r
I
,
r
T
)
Y
and the tensor factorization problem is reduced to minimizing an point-wise loss on
Y
:
2
min
U
,
I
,
T
,
C
(
ˆ
y
,
t
−
y
,
t
)
(2.3)
,
i
,
i
u
˜
˜
u
(
u
,
i
,
t
)
∈|U|×|I|×|T|
where
t
. As this optimization scheme tries to fit to the
numerical values of 1 and 0, we refer it as the
0/1 scheme
. To alleviate the sparse
problem and better utilize the tagging data, in this chapter, we propose RMTF for
factor inference, which is detailed in Sect.
2.3.1
.
Tag Refinement.
From the perspective of subspace learning, the derived factor
matrices
U
,
I
,
T
can be viewed as the feature representations on the latent
user
,
image
,
tag
subspaces, respectively. Each row of the factor matrices corresponds to
one object (user, image or tag). The core tensor
ˆ
y
=
C
×
u
u
×
i
i
i
×
t
t
,
i
u
˜
,
t
u
˜
defines a multilinear operation
and captures the interactions among different subspaces. Therefore, multiplying a
factor matrix to the core tensor is related to a change of basis. We define
C
UI
T
:=
C
×
t
T
(2.4)
UI
r
U
×
r
I
×|T|
can be explained as the tags' feature representations on
then
T
∈ R
the
user
×
image
subspace. Each
r
U
×
r
I
slice of matrix corresponds to one tag
UI
over the
user
dimensions, we can obtain
the tags' representations on the
image
subspace. Therefore, the cross-space image-tag
association matrix
X
IT
feature representation. By summing
T
∈ R
|I|×|T|
can be calculated as
3
:
×
u
1
r
U
)
X
IT
UI
=
I
·
(T
(2.5)
The tags with the
K
highest associations to image
i
are reserved as the final annota-
tions:
3
In practice, for new images not in the training dataset, we can approximate their positions in the
learnt image subspace by using approximated eigenfunctions based on the kernel trick [
2
].