Information Technology Reference
In-Depth Information
account has followed 2,000 users, the number of additional accounts it can follow
is limited to its follower number [21]. In order to break the limit of 2000 following,
spammer began to become others' followers and spam collusions formed. We
observed that the retweet behaviour of spam collusion accounts can be classified
into the following two types:
a) Only retweet massive spam tweets from a specific group of spammers.
b) Not only retweet a large number of spam tweets but also post spam tweets.
The other important rule to avoid spammers is to limit the maximum number of
tweets per day. Because of this, spammers can not continue to send junk tweets.
To let more victims see spam tweets, spammers use spam collusion accounts. We
can use this behavior characteristic to recognize Twitter spam.
Kwak et al. [9] discovered that half of retweets occur within an hour, and 75%
within a day. Their observation indicated that a ”window of survival”, from one
hour to one day, where a certain tweet gets a higher chance to be retweeted [13].
Therefore, to capture the time evolution features, we decide to analyze the infor-
mation diffusion in an hour. Then we propose our system called Message-passing
Graph Analyzer (Figure 1) to recognize the Twitter spam without the spam col-
lusion mask.
Fig. 1. The system architecture of our proposed system.
The first component is Message-passing Graph Construction. Through the
Twitter API [20] we can obtain tweet information. We exploit the unique ID of
the source tweet to aggregate tweets into a group. Then we extract the retweeters'
Twitter ID and the time the retweeter retweeted the source tweet. We also add
following relations to build the message-passing graph. The second component
is Time Evolution Features Extraction. We extract three graph structure based
features from the graph, which are degree variance, clustering coecient and
the number of triangles in the graph. Then we combine the features of the same
tweets between different time interval into a time evolution feature. The last
part of the system is Spam Detection that using labeled spammers' and normal
accounts' information to train the model.
3.2 Message-Passing Graph Construction
We use the time sequence of retweeters' ID to create a message-passing graph
representation of the Twitter social network in form of G(V, E) ,where
 
Search WWH ::




Custom Search