Detecting Spam on Twitter via Message-Passing Based on Retweet-Relation - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

our system is to identify spam. We use labeled accounts information and em-

ploy Singular Value Decomposition (SVD) method to train the latent semantic

indexing as shown in Equation 7:

V T

M = U

×

ʣ

×

(7)

where U and V are orthogonal matrices, ʣ is the diagonal matrix which consists

of singular vectors. For a tweet to be added, we extract the time evolution feature

ˉ add , then we use Equation 8 to convert coordinates.

( V T ) − 1

( ʣ ) − 1

ˉ convert = ˉ add ×

×

(8)

4 Experiments

4.1 Experiment Design

We use WEKA [23] to run clustering and classification algorithms. In the

Message-passing Graph Analyzer, we set the unit interval as five minutes. There-

fore, we made the experiments through changing different parameters to show

the sensitivity of our system. By searching for specific keywords in the Twitter

stream [17], we could collect the retweets. There were about 2.3 million spam

tweets and 2.7 million normal tweets collected between May 14, 2014 and July

15, 2014. The data which were collected before June 22th were for training, the

remaining data were for testing. The rules about how we identified spam were

as follows:

a) A few weeks later, we re-examined whether the account had been suspended

or not. If it had been suspended by Twitter, we labeled it as spam.

b) The profile of the account contained some keywords or hashtag such as “fol-

low”.

c) The screen name of the account contained some keywords like “follow” or

“followback”.

d) Some spammers used confusion word to set their screen name. For example,

they replaced the character o with zero, e.g., F0llowerz.

e) The screen name of the account like @follow ,where was a digit number

represented the accounts could generate by script [8].

In addition, we also need to identify some normal Twitter tweets. The rules

about how we identified normal tweets were as follows:

a) We checked the Twitter profile of accounts contained the blue verified

badge [19] or not, if they contained the blue verified badge then we labeled

them as a normal account.

b) Manually checking the contents of the tweets were legal or spam.

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home