Information Technology Reference
In-Depth Information
our system is to identify spam. We use labeled accounts information and em-
ploy Singular Value Decomposition (SVD) method to train the latent semantic
indexing as shown in Equation 7:
V T
M = U
×
ʣ
×
(7)
where U and V are orthogonal matrices, ʣ is the diagonal matrix which consists
of singular vectors. For a tweet to be added, we extract the time evolution feature
ˉ add , then we use Equation 8 to convert coordinates.
( V T ) 1
( ʣ ) 1
ˉ convert = ˉ add ×
×
(8)
4 Experiments
4.1 Experiment Design
We use WEKA [23] to run clustering and classification algorithms. In the
Message-passing Graph Analyzer, we set the unit interval as five minutes. There-
fore, we made the experiments through changing different parameters to show
the sensitivity of our system. By searching for specific keywords in the Twitter
stream [17], we could collect the retweets. There were about 2.3 million spam
tweets and 2.7 million normal tweets collected between May 14, 2014 and July
15, 2014. The data which were collected before June 22th were for training, the
remaining data were for testing. The rules about how we identified spam were
as follows:
a) A few weeks later, we re-examined whether the account had been suspended
or not. If it had been suspended by Twitter, we labeled it as spam.
b) The profile of the account contained some keywords or hashtag such as “fol-
low”.
c) The screen name of the account contained some keywords like “follow” or
“followback”.
d) Some spammers used confusion word to set their screen name. For example,
they replaced the character o with zero, e.g., F0llowerz.
e) The screen name of the account like @follow ,where was a digit number
represented the accounts could generate by script [8].
In addition, we also need to identify some normal Twitter tweets. The rules
about how we identified normal tweets were as follows:
a) We checked the Twitter profile of accounts contained the blue verified
badge [19] or not, if they contained the blue verified badge then we labeled
them as a normal account.
b) Manually checking the contents of the tweets were legal or spam.
 
Search WWH ::




Custom Search