Information Technology Reference
In-Depth Information
executable is benign or malicious but it was not considered if the executable
was packed or not. Perdisci et al. [18] and later Farooq et al. [16] used some
heuristics like entropy, or certain section characteristics to determine whether
an executable is packed or not, as a previous step to a deeper analysis. In this
paper, we combine both points of view, structural characteristics and heuristics,
providing a statistical analysis to determine their true relevance for determining
the packed state of an executable.
We consider that one of the main requisites of our anomaly detection sys-
tem is speed, as it constitutes a filtering step for a heavier unpacking process.
Therefore, we selected a set of features whose extraction does not require a sig-
nificant processing time, and avoided techniques such as code disassembly, string
extraction or n-gram analysis [18], which slow down the sample comparison.
Features can be divided into four main groups: 125 raw header characteristics
[17], 33 section characteristics (i.e., number of sections that meet certain prop-
erties), 29 characteristics of the section containing the entry point (the section
which will be executed first once the executable is loaded into memory) and,
finally, 24 entropy values. We apply relevance weights to each feature based on
Information Gain (IG) [19]. IG provides a ratio for each feature that measures its
importance to consider if a sample is packed or not. These weights were calcu-
lated from a dataset composed of 1,000 packed and 1,000 not packed executables,
and are useful not only to obtain a better distance rating among samples, but
also to reduce the amount of selected features, given that only 151 of them have
a non-zero IG value.
- DOS header characteristics (31). ThefirstbytesofthePEfileheader
correspond to the DOS executable header fields. IG results showed that
these characteristics are not specially relevant, having a maximum IG value
of 0.23, corresponding to a reserved field, which intuitively may not be a
relevant field. 15 values range from 0.10 to 0.16, and the rest present a
relevance bellow 0.10.
- File header block (23). This header block is present in both image files
(.EXE) and object files. From a total of 23 characteristics, 14 have an IG
value greater than 0, and only 2 of them have an IG value greater than 0.01:
the number of sections (0.3112) and the time stamp ( 0.1618).
- Optional Header Block (71). This optional block is only present in im-
age files and contains data about how the executable must be loaded into
memory. 37 features have an IG value over 0, but the most relevant ones are:
the address of entry point (0.5111), the Import Address Table (IAT) size
(0.3832) and address (0.3733) (relative to the number of imported DLLs),
the size of the code (0.3011), the base of the data (0.2817), the base of the
code (0.2213),the major linker version (0.1996), checksum (0.1736), the size
of initialized data (0.1661), the size of headers (0.1600), the size of relocation
table (0.1283) and the size of image (0.1243).
- Section characteristics (33). From the 33 characteristics that conform
this group, 22 have an IG value greater than 0. The most significant ones
are: the number of non-standard sections (0.7606), the number of executable
 
Search WWH ::




Custom Search