Structural Feature Based Anomaly Detection for Packed Executable Identification - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

sections (0.7127); the maximum raw data per virtual size ratio (0.5755)

( rawSize/virtualSize ,where rawSize is defined as the section raw data

size and virtualSize is the section virtual size, both expressed in bytes),

the number of readable and executable sections (0.5725) and the number of

sections with a raw data per virtual size ratio lower than 1 (0.4842).

- Section of entry point characteristics (29). This group contains char-

acteristics relative to the section which will be executed once the executable

is loaded into memory. 26 characteristics have an IG value greater than 0,

from which 11 have a significant relevance: the characteristics field in its

raw state (0.9757), its availability to be written (0.9715), the raw data per

virtual size ratio (0.9244), the virtual address (0.7386), whether is a pointer

to raw data or not (0.6064), whether is a standard section or not (0.5203),

the virtual size (0.4056), whether it contains initialized data (0.3721), the

size of raw data (0.2958) and its availability to be executed (0.1575).

- Entropy values (24). We have selected 24 entropy values, commonly used

in previous works [18], from which 22 have an IG value greater than 0, and

9 have a relevant IG value: max section entropy (0.8375), mean code sec-

tion entropy (0.7656), mean section entropy (0.7359), file entropy (0.6955),

entropy of the section of entry point (0.6756), mean data section entropy

(0.5637), header entropy (0.1680), number of sections with an entropy value

greater than 7.5 (0.7445), and number of sections with an entropy value

between 7 and 7.5 (0.1059).

In this way, every feature is represented as a decimal value and then normalized,

dividing each value by the maximum value for that feature in the whole dataset.

This way, we can represent each executable as a vector of decimal values that

range from 0 to 1. The final step is to apply the relevance obtained from IG, and

it consists of multiplying each value in the normalized vector by its relevance.

3 Anomaly Detection

Through the features described in the previous section, our method represents

unpacked executables as points in the feature space. When an executable is being

inspected our method starts by computing the values of the point in the feature

space. This point is then compared with the previously calculated points of the

unpacked executables.

To this end, distance measures are required. In this study, we have used the

following distance measures:

- Manhattan Distance. This distance between two points v and u is the

sum of the lengths of the projections of the line segment between the points

onto the coordinate axes: d ( x, i )= i =0 |

where x is the first point;

y is the second point; and x i and y i are the i th component of first and second

point, respectively.

- Euclidean Distance. This distance is the length of the line segment con-

x i −

y i |

necting two points. It is calculated as: d ( x, y )= i =0 v 2

u 2

i

i −

where x is

Computational Intelligence in Security for Information Systems

Search WWH ::

Custom Search

Home