Information Technology Reference
In-Depth Information
where u is a node in the message-passing graph and u
V, n is the total number
of nodes.
Transitivity : As shown in Equation 3, the denominator represents the num-
ber of triangles in the graph, and the numerator is the number of triads in the
graph which means two edges with a shared vertex [11]:
number of triangles
number of triads
Trans =3
×
(3)
Comparing with the legitimate tweets, the message-passing graphs of spam
tweets has higher transitivity.
Degree variance : The variance is the degree of dispersion in a set of data.
We want to use variance to measure the vertex, which is defined as shown in
Equation 4:
n
DV = 1
n
( deg ( i ) − deg ) 2
(4)
i =1
where deg(i) is the degree value of i-th node and deg is the average degree value
of all the nodes in the message-passing graph. This metric is to measure whether
the message-passing graph has small community or not. The degree variance of
legal tweets is higher than the degree variance of spam tweets.
Extractor Based on Time Evolution Feature. In our system, we set the
unit time interval as five minutes which means we divide an hour into twelve
blocks. For each block we construct the message-passing graph and extract the
graph-based features which are average clustering coe cient, transitivity and
degree variance from it. The equation is as shown in Equation 5:
G j = <ac j ,trans j ,dv j >
(5)
ac j
trans j
dv j
DV, G represents the message-passing
graph, where i is i-th time interval block and j represents the index value of
tweet ID. When considering the time range, we can combine those features into
the time evolution feature ˉ j .Let ˉ j = <G j , G j , G j ,..., G j > denotes the time
evolution feature of a tweet which index value is j ,where m is the total number
of time interval block, and here the value of m is twelve. Then we can get matrix
M as shown in Equation 6:
AC,
Trans,
M =[ ˉ 1 2 ,...,ˉ p ] T
(6)
where p is the total number of tweets.
3.4 Spam Detection
After establishing the message-passing graph, extracting graph-based features,
combining into the time evolution feature and get matrix M , the final step of
 
Search WWH ::




Custom Search