Massive Data Mining for Polymorphic Code Detection - Computer Network Security

Information Technology Reference

In-Depth Information

2. Search for execution chains : This phase analyzes the data after the NOP

zone by using a recursive function capable of following different execution

chains in disassembled code. Whenever a controlflow instruction is detected,

the function extracts the destination address and continues disassembling at

this address. Depending on the instruction the function also follows the code

directly after the instruction. For a similar approach we refer to [19].

3. Neural network classification: Whenever a termination criterion is met

(see [13] for details), the recursive function stops to follow the code and

starts neural network classification.

The input for the neural network is the spectrum of encountered in-

structions along an execution path. (Here and in the course of this paper, by

spectrum we mean a representation of the relative frequencies.) If the output

of the neural network is larger than zero, a possible shellcode is reported.

The features of the neural network were chosen by investigating the in-

structions used by the available polymorphic shellcode engines. These in-

structions were then used to create groups of similar instructions. Further

instructions from the X86 set were then added to the groups. The groups

are numbered and represent the features/inputs for the neural network. A

complete list can be found in [13].

Results:

HDE was evaluated with six shellcode engines. There are three public available

engines, that can be used to generate polymorphic shellcodes. These are ADM-

Mutate [7], CLET [4] and JempiScodes [17]. With the knowledge we got from

investigating these engines, we also made up our minds on alternative methods to

generate polymorphism. As a result, we developed three independent shellcode

engines which are based on different concepts.

In what follows, we will call these engines EE1, EE2 and EE3 (Experimental

Engine). The purpose of these engines was to improve our detection mechanism

by experimenting with concepts that could possibly evade HDE. EE1 was based

on inserting junk instructions and XOR encryption. Such a mechanism was also

proposed by the authors of [4]. EE2 uses the Tiny Encryption Algorithm (TEA)

to encrypt the payload. EE3 uses random chains of simple instructions which

are applied to the payload to transform the payload. The inverted instruction

chain serves simultaneously as decryption engine and key.

Evaluation of HDE was made by training six neural networks (one for each

polymorphic shellcode engine) and applying them to test data provided by the

six engines and to real data known to be free of shellcodes. The results can be

seen in table 1. To increase the detection accuracy for unknown engines, a new

network was trained with positive training data used for the two best neural

networks (ADMMutate and EE3) 2. In general, evaluation shows that HDE is

able to detect engines not available during the training process.

2.2 Self-organizing Maps

Since we already applied the theory of Self-Organizing Maps in the context of

trac classification (cf. [14]), we also wanted to see them perform in anomaly

detection. For the theory of SOMs, we refer to [8].

Computer Network Security

Search WWH ::

Custom Search

Home