Information Technology Reference
In-Depth Information
sections (0.7127); the maximum raw data per virtual size ratio (0.5755)
( rawSize/virtualSize ,where rawSize is defined as the section raw data
size and virtualSize is the section virtual size, both expressed in bytes),
the number of readable and executable sections (0.5725) and the number of
sections with a raw data per virtual size ratio lower than 1 (0.4842).
- Section of entry point characteristics (29). This group contains char-
acteristics relative to the section which will be executed once the executable
is loaded into memory. 26 characteristics have an IG value greater than 0,
from which 11 have a significant relevance: the characteristics field in its
raw state (0.9757), its availability to be written (0.9715), the raw data per
virtual size ratio (0.9244), the virtual address (0.7386), whether is a pointer
to raw data or not (0.6064), whether is a standard section or not (0.5203),
the virtual size (0.4056), whether it contains initialized data (0.3721), the
size of raw data (0.2958) and its availability to be executed (0.1575).
- Entropy values (24). We have selected 24 entropy values, commonly used
in previous works [18], from which 22 have an IG value greater than 0, and
9 have a relevant IG value: max section entropy (0.8375), mean code sec-
tion entropy (0.7656), mean section entropy (0.7359), file entropy (0.6955),
entropy of the section of entry point (0.6756), mean data section entropy
(0.5637), header entropy (0.1680), number of sections with an entropy value
greater than 7.5 (0.7445), and number of sections with an entropy value
between 7 and 7.5 (0.1059).
In this way, every feature is represented as a decimal value and then normalized,
dividing each value by the maximum value for that feature in the whole dataset.
This way, we can represent each executable as a vector of decimal values that
range from 0 to 1. The final step is to apply the relevance obtained from IG, and
it consists of multiplying each value in the normalized vector by its relevance.
3 Anomaly Detection
Through the features described in the previous section, our method represents
unpacked executables as points in the feature space. When an executable is being
inspected our method starts by computing the values of the point in the feature
space. This point is then compared with the previously calculated points of the
unpacked executables.
To this end, distance measures are required. In this study, we have used the
following distance measures:
- Manhattan Distance. This distance between two points v and u is the
sum of the lengths of the projections of the line segment between the points
onto the coordinate axes: d ( x, i )= i =0 |
where x is the first point;
y is the second point; and x i and y i are the i th component of first and second
point, respectively.
- Euclidean Distance. This distance is the length of the line segment con-
x i
y i |
necting two points. It is calculated as: d ( x, y )= i =0 v 2
u 2
i
i
where x is
 
Search WWH ::




Custom Search