what-when-how
In Depth Tutorials and Information
140
Predicting data
Real data
120
100
80
60
40
20
0
0
5
10
15
20
Time (day)
25
30
35
40
Figure3.1
(Continued)Comparisonbetweenrealdataandthepredictedresults.
3.2.2.3 Predicting the Posting Behavior Based
on a Machine-Learning Approach
Besides the foregoing, Chen et al. [37] also built a social network and profile-based
blogging-behavior model to predict the posting behavior. Based on social-network
and profile-based blogging behavior features
<
>
, ( ) , ( ) , ( )
for blog-
ger
j
, they trained the social network and profile-based blogging-behavior model
and predicted future blogging behaviors of blogger
j
by using regression techniques.
he details of the features are described in the following text.
Topicdistributionvector
T
z
: For each time window
z
, the content of the blog
entries is represented as a topic distribution vector
T T j C j
S j
z
p
z
p
z
z
T
, , ,...,
that
represents the distributions of blog entries with respect to the list of topics,
where
n
is the number of topics, and
t
i
represents the weight of the
i
-th topic
within time window
z
. he
i
-th component of a topic distribution vector can be
calculated as the total number of blog entries belonging to
i
-th topic divided by
the total number of blog entries in time window
z
.
Personaltopicdistributionvector
(
= <
t
t
t
t
>
z
1
2
3
n
z
T
p z
: For the profile-based topic distribu-
tion, Chen et al. have proposed to add the personal topic distribution vector
T j
p
( ) )
( )
z
T j
= <
>
to the general blogging-behavior features,
1 2 3
, where
t
1
j
represents the distribution of topic 1 for blogger
j
within time window z. Here
the weight of
t
1
j
is calculated as the percentage of blog entries posted by blogger
j
belonging to topic
1
(denoted as |
t
1
j
|) against the total number of blog entries
posted by blogger
j
(denoted as |
tj
|) in the time window
z
.
( )
t
,
t
,
t
,...,
t
p
z
j
j
j
nj
z