Information Technology Reference
In-Depth Information
For this reason, in the recent past solutions for detecting collusion attacks on repu-
tation systems started to appear. Machine learning has always been an attractive solu-
tion, given that it copes well the uncertainties that exist in security. A representative
solution using hierarchical clustering is given in [8]. This solution, as many others,
after the training assign the clusters that contain the majority of the data as “good”
clusters. However, this imposes restrictions on training data, as if the algorithm does
not process the “unclean” data during the training, it will not be able to detect attacks.
A solution based on graph theory is given in [9]. This solution, instead of using the
count-based scheme that considers the number of accusations, uses the community-
based scheme that achieves the detection of up to 90% of the attackers, which permits
the correct operation of the system.
Thus, our aim is to design a detection based solution that would overcome the
abovementioned issues. Namely, we want to provide a solution that would not have
any restrictions regarding training data, and that would be capable of detecting up to
100% of malicious entities.
3 Proposed Solution
3.1 Feature Extraction and Formation of Model
For each entity, the feature vector is formed of the recommendation the others give on
it. The main idea is to find inconsistencies in recommendations. In the case the repu-
tation system considers separately different services each entity has to offer, each
service is characterized and examined independently. The characterization is based on
the idea of k- grams and it is performed in equidistant moments of time using the rec-
ommendations between the consecutive moments. The features are different sets of
recommendations ( k- grams) and their occurrence or their frequency during the charac-
terization period. Let the recommendations issued for the node n from five different
nodes during 10 sample periods be those given in Table 1.
Table 1. Example of recommendations
n1 n2
n3 n4
n5
1
100
99 100
95
99
2
100
99 100 95
99
100 99
100 95
99
3
4
98
99
98
98
99
5
98
99
98
98
99
98
99
98
98
99
6
7
98
99
98
98
99
8
95
95
97
97
08
95
95
97
97
08
9
10 95
95
97
97
08
In this case, the extracted k -grams, i.e. features, and their corresponding feature val-
ues are given in Table 2. From this example it is obvious that the extracted number of
different k -grams does not have to be the same in all characterization period. Thus, we
 
Search WWH ::




Custom Search