Information Technology Reference
In-Depth Information
We consider that these results are significant for the anti-malware industry.
The reduction of the efforts required for unknown malware can help to deal
with the increasing amount of new malware. In particular, a preliminary test
with a Bayesian Network trained with Hill climber shows an accuracy of 86.73%
which only a bit higher that the presented semi-supervised approach. However,
because of the static nature of the features we used with LLGC, it cannot counter
packed malware. Packed malware is produced by cyphering the payload of the
executable and having it deciphered when finally loaded into memory. Indeed,
broadly-used static detection methods can deal with packed malware only by
using the signatures of the packers. Accordingly, dynamic analysis seems to be
a more promising solution to this problem [15]. One solution for this obvious
limitation of our malware detection method is the use of a generic dynamic
unpacking schema such as PolyUnpack [16], Renovo [15], OmniUnpack [17] and
Eureka [18].
5 Concluding Remarks
Unknown malware detection has become an important topic of research and con-
cern owing to the growth of malicious code in recent years. Moreover, it is well
known that the classic signature methods employed by antivirus vendors are no
longer completely effective in facing the large volumes of new malware. There-
fore, signature methods must be complemented with more complex approaches
that provide the detection of unknown malware families. While machine-learning
methods are a suitable approach for unknown malware, they require a high num-
ber of labelled executables for each classes (i.e., malware and benign datasets).
Since it is di cult to obtain such amounts of labelled data in a real-word envi-
ronment, a time-consuming process of analysis is mandatory.
In this paper, we propose the use of a semi-supervised learning approach
for unknown malware detection. This learning technique does not need a large
amount of labelled data; it only needs several instances to be labelled. Therefore,
this methodology can reduce efforts in unknown malware detection. By labelling
50% of the software, we can achieve results with more than 83% accuracy.
Future work will be focused on three main directions. First, we plan to extend
our study of semi-supervised learning approaches by applying more algorithms to
this issue. Second, we will use different features for training these kinds of models.
Finally, we will focus on facing packed executables with a hybrid dynamic-static
approach.
References
1. Ollmann, G.: The evolution of commercial malware development kits and colour-
by-numbers custom malware. Computer Fraud & Security 2008(9), 4-7 (2008)
2. Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: AccessMiner:
using system-centric models for malware protection. In: Proceedings of the 17th
ACM Conference on Computer and Communications Security, pp. 399-412. ACM,
New York (2010)
 
Search WWH ::




Custom Search