Collective Classification for Spam Filtering - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

For this reason, in the recent past solutions for detecting collusion attacks on repu-

tation systems started to appear. Machine learning has always been an attractive solu-

tion, given that it copes well the uncertainties that exist in security. A representative

solution using hierarchical clustering is given in [8]. This solution, as many others,

after the training assign the clusters that contain the majority of the data as “good”

clusters. However, this imposes restrictions on training data, as if the algorithm does

not process the “unclean” data during the training, it will not be able to detect attacks.

A solution based on graph theory is given in [9]. This solution, instead of using the

count-based scheme that considers the number of accusations, uses the community-

based scheme that achieves the detection of up to 90% of the attackers, which permits

the correct operation of the system.

Thus, our aim is to design a detection based solution that would overcome the

abovementioned issues. Namely, we want to provide a solution that would not have

any restrictions regarding training data, and that would be capable of detecting up to

100% of malicious entities.

3 Proposed Solution

3.1 Feature Extraction and Formation of Model

For each entity, the feature vector is formed of the recommendation the others give on

it. The main idea is to find inconsistencies in recommendations. In the case the repu-

tation system considers separately different services each entity has to offer, each

service is characterized and examined independently. The characterization is based on

the idea of k- grams and it is performed in equidistant moments of time using the rec-

ommendations between the consecutive moments. The features are different sets of

recommendations ( k- grams) and their occurrence or their frequency during the charac-

terization period. Let the recommendations issued for the node n from five different

nodes during 10 sample periods be those given in Table 1.

Table 1. Example of recommendations

n1 n2

n3 n4

n5

1

100

99 100

95

99

2

100

99 100 95

99

100 99

100 95

99

3

4

98

99

98

99

5

98

99

98

99

98

99

98

99

6

7

98

99

98

99

8

95

97

08

95

97

08

9

10 95

95

97

08

In this case, the extracted k -grams, i.e. features, and their corresponding feature val-

ues are given in Table 2. From this example it is obvious that the extracted number of

different k -grams does not have to be the same in all characterization period. Thus, we

Computational Intelligence in Security for Information Systems

Search WWH ::

Custom Search

Home