Structural Feature Based Anomaly Detection for Packed Executable Identification - Computational Intelligence in Security for Information Systems

Information Technology Reference

In-Depth Information

5 Discussion and Conclusions

Like the previous work, our method is focused on executable pre-filtering, as

an initial phase to decide whether to analyse samples on a generic unpacker or

not. Our main contribution is the anomaly-detection-based approach employed

for packed executable identification. In contrast to previous approaches, this

method does not need previously labelled packed and not packed executables,

as it measures the deviation of executables respect to normality (not packed

executables). Moreover, as it does not use packed samples for comparison, it is

independent of the packer used to protect the executables. Although anomaly

detection systems tend to produce high false positive rates, our experimental

results show very low values in all cases. This fact proofs the validity of our

initial hypothesis.

Anyway, it presents some limitations that should be studied in further work.

First, it cannot identify the packer nor the family of the packer used to protect

the executable. Such information would help the malware analyst in the task

of unpacking the executable. Sometimes, generic unpacking techniques are very

time consuming or fail and it is easier to use specific unpacking routines, created

for most commonly used packers.

Secondly, the features extracted can be modified by malware writers in order

to bypass the filter. In the case of structural features, packers could build ex-

ecutables using the same flags and patterns as common compilers, for instance

importing common DLL files or creating the same number of sections. Heuris-

tic analysis, in turn, can be evaded by using standard sections instead of not

standard ones, or filling sections with padding data to unbalance byte frequency

and obtain lower entropy values. What is more, our system is very dependant

on heuristics due to the relevance values obtained from IG, making it vulnerable

to such attacks.

Finally, it is important to consider eciency and processing time. Our system

compares each executable against a big dataset (400 vectors). Despite Euclidean

and Manhattan distances are easy to compute, cosine distance and more complex

distance measures such as Mahalanobis distance may take too much time to

process every executable under analysis. For this reason, in further work we will

emphasize on improving the system eciency by clustering not packed vectors

and reducing the whole dataset to a limited amount of samples.

Acknowledgements

This research was partially supported by the Basque Government under a pre-

doctoral grant given to Xabier Ugarte-Pedrero. We would also like to acknowl-

edge Iker Pastor Lopez for his help in the experimental configuration.

References

1. Kaspersky: Kaspersky security bulletin: Statistics (2008), http://www.viruslist.

com/en/analysis?pubid=204792052

Search WWH ::

Custom Search

Home