Information Technology Reference
In-Depth Information
This approach presents an attempt to go beyond “sample matching.” The intention
is to concentrate efforts on the detection of the one generic feature of all computer
viruses, the “gene of self-replication,” which is typically present in computer viruses
and virtually unknown in legitimate software. This task will be performed not on
binary sequences, but on sequences of instructions that can be understood as letters in
an alphabet (the approximate number of letters in the alphabet is very close to the
total number of instructions), with the understanding that although self-replication can
be achieved in a number of ways, this number is finite and is believed not to exceed
50. Therefore, the search for the “gene of self-replication” can be understood as the
search for particular words on the array of letters, almost like a crossword puzzle with
the following peculiarities.
First, instructions form multiple strings with a well-defined order of execution.
This feature simplifies the task by eliminating concern over the position of particular
words (strings of interest): all words are positioned along the string and should be
read from left to right, in the order of execution.
Second, the string of instructions, for example, forming the word “ replication ” that
represents a particular self-replication procedure does not have to be continuous. In
the process of execution, the self-replication task can be temporarily interrupted to
perform malicious or auxiliary subtasks, for example a display of offensive messages.
This makes the search more difficult. It requires the search to expand from finding the
word “ replication ” as a continuous string of letters to searching for a letter “ r ” that is
eventually followed by letter “ e ” that is eventually followed by letter “ p ”, etc. Fortu-
nately, there are some decryption and deciphering techniques that could be utilized for
this problem.
Third, malicious code can arrive partially encoded and decode itself prior to execu-
tion, which presents a serious challenge for any virus detection method. This diffi-
culty, however, could be addressed through periodic interruption of the execution of
the code in question and analysis of the composition of the executable image. Another
approach implies the monitoring and analysis of the sequence of macro commands
presented for execution.
Although there are questions about the feasibility of detecting a computer virus by
subjecting its code to a cryptographic analysis, this approach concentrates on a very
narrow task: the detection of a particular feature of a malicious code, its “gene of self-
replication.” In addition, the proposed detection procedure will analyze not a “static”
file containing the code in question, but the sequence of executable instructions that
evolves during the execution of the code. Finally, a probabilistic approach resulting in
the computation of the conditional probability of maliciousness subject to particular
features discovered in the executable code can be utilized. Indeed, while according to
[2] sufficient conditions for the detection of computer viruses may not exist in the
mathematical sense, this approach is aimed establishing the necessary conditions and
then utilizing these conditions for the development of instrumental, general-purpose,
anti-virus software capable of detecting new, previously unknown, computer viruses.
First, several typical sequences of instructions that implement the task of self-
replication will need to be established. These sequences will constitute the set of
“words” or “patterns” that would provide evidence that the code may be a computer
virus. Constructing a number of alternative self-replication procedures and subjecting
them to special analyses/parsing in order to detect their generic semantic features will
accomplish this task. At the same time, known computer viruses will be subjected to
analyses aimed at the detection of their “gene of self-replication” and will attempt to
Search WWH ::




Custom Search