Information Technology Reference
In-Depth Information
-
Node
V
:Thesetofnodes
V
represents unique users' Twitter ID.
-
Edge
E
:An
edge
(
i
,
j
)
E
between two nodes
v
i
,
v
j
could represent
a retweet relation. The retweet relations are the combination of following
relations and retweet time.
As shown in Figure 2(a) and 2(b), the message-passing graph of a spam looks
different to its Twitter graph. For a normal tweet, the two graph looks alike.
This is helpful for spam detection.
ā
(a) Twitter graph
(b) Message-passing graph
Fig. 2.
This spam tweet is posted by @
followback 707
. (a)Use the Twitter API to
establish the original diagram can not highlight automatically retweet behavior. (b) The
automatically retweet behaviors of spam collusion account are preserved in message-
passing graph.
3.3 Time Evolution Features Extraction
Graph-Based Features.
As we consider treat the whole Twitter social network
as a directed graph
G(V, E)
, there are several graph-based features we can use:
Average clustering coe
cient
: The clustering coecient
C
u
of a node
u
is defined as shown in Equation 1:
2
T
u
deg
(
u
)(
deg
(
u
)
C
u
=
(1)
ā
1)
where
u
V,
deg
(
u
) is the number of neighbours of
u
and
T
u
is the number of
connected pairs between all neighbours of
u
[22,2]. Spammers or spam collusion
accounts usually blindly retweet the tweets which were posted by small com-
munities. Therefore, their retweet relations are with with very high overlapping
probability. We use average clustering coecient of a graph which is written in
Equation 2 to replace the clustering coecient:
AC
=
1
ā
n
uāG
C
u
(2)
Search WWH ::
Custom Search