Information Technology Reference
In-Depth Information
Opcode-Sequence-Based Semi-supervised
Unknown Malware Detection
Igor Santos, Borja Sanz, Carlos Laorden, Felix Brezo, and Pablo G. Bringas
S 3 Lab, DeustoTech - Computing, Deusto Institute of Technology
University of Deusto,
Avenida de las Universidades 24, 48007
Bilbao, Spain
{isantos,borja.sanz,claorden,felix.brezo,pablo.garcia.bringas}@deusto.es
Abstract. Malware is any computer software potentially harmful to
both computers and networks. The amount of malware is growing every
year and poses a serious global security threat. Signature-based detection
is the most extended method in commercial antivirus software, however,
it consistently fails to detect new malware. Supervised machine learning
has been adopted to solve this issue, but the usefulness of supervised
learning is far to be complete because it requires a high amount of mali-
cious executables and benign software to be identified and labelled pre-
viously. In this paper, we propose a new method of malware detection
that adopts a well-known semi-supervised learning approach to detect
unknown malware. This method is based on examining the frequencies of
the appearance of opcode sequences to build a semi-supervised machine-
learning classifier using a set of labelled (either malware or legitimate
software) and unlabelled instances. We performed an empirical validation
demonstrating that the labelling efforts are lower than when supervised
learning is used while the system maintains high accuracy rate.
Keywords: malware
detection
learning,
machine
learning,
semi-
supervised learning.
1
Introduction
Malware is defined as any computer software explicitly designed to damage com-
puters or networks. While in the past malware writers seek 'fame and glory',
currently their motivation has evolved to malicious economic considerations [1].
The commercial anti-malware software is highly dependant on a signature
database [2]. A signature is a unique sequence of bytes that is always present
within malicious executables and in the files already infected. The main issue
of this approach is that malware analysts must wait until new malware has
harmed several computers to generate a signature file and provide a solution.
Analysed suspect files are compared with this list of signatures. When the signa-
tures match, the file being tested is classified as malware. Although this approach
has been proven as effective when threats are known in beforehand, these signa-
ture methods are surpassed with large amounts of new malware.
 
Search WWH ::




Custom Search