Information Technology Reference
In-Depth Information
5 Discussion and Conclusions
Like the previous work, our method is focused on executable pre-filtering, as
an initial phase to decide whether to analyse samples on a generic unpacker or
not. Our main contribution is the anomaly-detection-based approach employed
for packed executable identification. In contrast to previous approaches, this
method does not need previously labelled packed and not packed executables,
as it measures the deviation of executables respect to normality (not packed
executables). Moreover, as it does not use packed samples for comparison, it is
independent of the packer used to protect the executables. Although anomaly
detection systems tend to produce high false positive rates, our experimental
results show very low values in all cases. This fact proofs the validity of our
initial hypothesis.
Anyway, it presents some limitations that should be studied in further work.
First, it cannot identify the packer nor the family of the packer used to protect
the executable. Such information would help the malware analyst in the task
of unpacking the executable. Sometimes, generic unpacking techniques are very
time consuming or fail and it is easier to use specific unpacking routines, created
for most commonly used packers.
Secondly, the features extracted can be modified by malware writers in order
to bypass the filter. In the case of structural features, packers could build ex-
ecutables using the same flags and patterns as common compilers, for instance
importing common DLL files or creating the same number of sections. Heuris-
tic analysis, in turn, can be evaded by using standard sections instead of not
standard ones, or filling sections with padding data to unbalance byte frequency
and obtain lower entropy values. What is more, our system is very dependant
on heuristics due to the relevance values obtained from IG, making it vulnerable
to such attacks.
Finally, it is important to consider eciency and processing time. Our system
compares each executable against a big dataset (400 vectors). Despite Euclidean
and Manhattan distances are easy to compute, cosine distance and more complex
distance measures such as Mahalanobis distance may take too much time to
process every executable under analysis. For this reason, in further work we will
emphasize on improving the system eciency by clustering not packed vectors
and reducing the whole dataset to a limited amount of samples.
Acknowledgements
This research was partially supported by the Basque Government under a pre-
doctoral grant given to Xabier Ugarte-Pedrero. We would also like to acknowl-
edge Iker Pastor Lopez for his help in the experimental configuration.
References
1. Kaspersky: Kaspersky security bulletin: Statistics (2008), http://www.viruslist.
com/en/analysis?pubid=204792052
Search WWH ::




Custom Search