Information Technology Reference
In-Depth Information
Massive Data Mining for Polymorphic Code
Detection
Udo Payer, Peter Teufl, Stefan Kraxberger, and Mario Lamberger
Institute of Applied Information Processing and Communications,
Inffeldgasse 16a, 8010 Graz, Austria
{ Udo.Payer, Peter.Teufl, Stefan.Kraxberger,
Mario.Lamberger } @iaik.tugraz.at
Abstract. Driven by the permanent search for reliable anomaly-based
intrusion detection mechanisms, we investigated different statistical
methodologies to deal with the detection of polymorphic shellcode. The
paper intends to give an overview on existing approaches in the litera-
ture as well as a synopsis of our efforts to evaluate the applicability of
data mining techniques such as Neural Networks, Self Organizing Maps,
Markov Models or Genetic Algorithms in the area of polymorphic code
detection. We will then present our achieved results and conclusions.
1
Introduction
This paper is based on a set of known polymorphic shellcode generators (AD-
MMutate [7], CLET [4], JempiScodes [17]) and will discuss the effectiveness of
statistical methods like neural networks (NN) [5], Self Organizing Maps (SOM)
[8] or finite Markov chains (MC) [20] for detecting malicious code. After an-
alyzing existing polymorphic shellcode detection techniques (such as FNORD
[16], APE [19] or Buttercup [12]), we have developed several possible approaches
which have all in common, that they only make use of payload information
without any use of additional information (e. g. header information).
For a good introduction on the concept behind shellcodes and polymorphic
shellcodes we refer to [1] and [4].
2
Data Mining Approaches
2.1
Hybrid Detection Engine Using Neural Networks- HDE
In [13], we proposed a HDE which uses three phases to detect polymorphic
shellcodes:
1. NOP zone detection: This phase searches the network trac for consec-
utive chains of predefined NOP instructions (taken from ADMMutate and
CLET). Whenever a chain exceeding a threshold length is found, the next
phase is triggered. To overcome the problem with short or no NOP zones,
this phase is scalable and can be turned off completely.
V. Gorodetsky, I. Kotenko, and V. Skormin (Eds.): MMM-ACNS 2005, LNCS 3685, pp. 448-453, 2005.
c
Search WWH ::




Custom Search