Information Technology Reference
In-Depth Information
the first point; y is the second point; and x i and y i are the i th component of
first and second point, respectively.
- Cosine Similarity. It is a measure of similarity between two vectors by
finding the cosine of the angle between them [20]. Since we are measuring
distance and not similarity we have used 1
Cosine Similarity as a distance
v · u
|| v ||·|| u ||
measure: d ( x, y )=1
cos ( θ )=1
where
v
is the vector from the
origin of the feature space to the first point x ,
u
is the vector from the origin
of the feature space to the second point y ,
v · u
is the inner product of
v
and
. This distance ranges from
0 to 1, where 1 means that the two executables are completely different and
0 means that the executables are the same (i.e., the vectors are orthogonal
between them).
u
.
|| v ||·|| u ||
is the cross product of
v
and
u
By means of these measures, we are able to compute the deviation of an exe-
cutable respect to a set of not packed executables. Since we have to compute
this measure with the points representing not packed executables, a combina-
tion rule is required in order to obtain a final value of distance which considers
every measure performed. To this end, our system employs 3 simple rules: (i)
select the mean value, (ii) select the lowest distance value and (iii) select the
highest value of the computed distances. In this way, when our method inspects
an executable a final distance value is acquired, which will depend on both the
distance measure and the combination rule.
4 Empirical Validation
To evaluate our anomaly-based packed executable detector, we collected a dataset
comprising 500 not packed executables and 1,000 packed executables. The first
one is composed of 250 benign executables and 250 malicious executables gath-
ered from the VxHeavens [21] website. The packed dataset is composed of 500
benign executables and 500 malicious executables from VxHeavens [21]. To gen-
erate the packed dataset, we employed not packed executables and we packed
them using 10 different packing tools with different configurations: Armadillo,
ASProtect, FSG, MEW, PackMan, RLPack, SLV, Telock, Themida and UPX.
Then, using this dataset we performed a 5-fold cross-validation to divide the
not packed dataset into 5 different divisions of 400 executables for representing
normality and 100 for measuring deviations. In this way, each fold is composed of
400 not packed executables that will be used as representation of normality and
1,100 testing executables, from which 100 are not packed and 1,000 are packed.
Hereafter, we extracted their structural characteristics and employed the 3
different measures and the 3 different combination rules described in Section 3 to
obtain a final deviation measure for each tested executable. For each measure and
combination rule, we established 10 different thresholds to determine whether
an executable is packed or not.
We evaluated accuracy by measuring False Negative Ratio (FNR) and False
Positive Ratio (FPR). FNR is defined as: FNR ( β )= FN
FN
where TP is the
number of packed executable cases correctly classified (true positives) and FN is
+
TP
 
Search WWH ::




Custom Search