Information Technology Reference
In-Depth Information
5.4.2 Topic Association
5.4.2.1 Transition Probability-Based Topic Association
With the derived heterogeneous topic spaces, topic association is to discover corre-
lation, i.e., an association matrix A between them. Recall that the basic idea is: if
many overlapped users who take interests in the i th YouTube topic also follow the
j th Twitter topic, the association between the two topics a ij tends to be strong.
By examining the collaborative involvement of cross-network topics among over-
lapped users, we view topic association as a probabilistic transition problem and
calculate the association matrix A by aggregating over all the overlapped users:
z Twi
j
z Yo u
i
z Twi
j
z Yo u
i
a ij =
p
(
|
) =
p
(
|
u
) ·
p
(
u
|
)
U
u
z Yo u
i
where the prior p
indicates the i th YouTube topic distribution for user u .
By calculating all cross-network topic pairs and subsequent normalization, we can
obtain the topic association matrix A
(
|
u
)
={
a ij }
.
5.4.2.2 Regression-Based Topic Association
The above probability-based method directly calculates over all overlapped users,
where noisy user topic distributions will deteriorate the derived association matrix.
Alternative way to obtain the association matrix is to formulate it as an optimization
problem. Specifically, we interpret the topic association as a linear regression between
the two user distribution matrices U Yo u and U Twi .
Formally, the regression objective function is:
U Twi
AU Yo u
2
mi A ||
||
+ ʻ 1 ||
A
|| q
(5.4)
where the first term represents the regression error, the second term is the regular-
ization penalty used to avoid overfitting, and
ʻ 1 ∈[
0
,
1
]
is the weighting parameter.
When q
=
1, Eq. ( 5.4 ) is a lasso problem and can be effectively solved by LARS [ 10 ].
When q
=
2, Eq. ( 5.4 ) is a ridge regression problem with analytical solution as:
U Twi U Yo u T
U Yo u U Yo u T
) 1
A
=
(
+ ʻ 1 I
(5.5)
where I is the identity matrix. We denote the regression-based association strategy
when q
=
1 and q
=
2as Regression_l1 and Regression_l2 .
 
Search WWH ::




Custom Search