Information Technology Reference
In-Depth Information
a one-to-one mapping of the native instruction address to the imaginary IR address.
For example, in Table 1, the x86 instruction
movsx edx, byte ss:[ebp-10]
at address
0x401073
will always be mapped to a list of REIL instructions from
0040107300
to
0040107305
. Therefore, it is easy to map the analysis results back to the native forms
before reporting them to the user.
REIL has a simple
register-based
architecture, which can keep native registers and
create temporary registers when needed. Preserving native registers is particularly use-
ful for implementing the offline concrete and symbolic (or concolic) execution. Recall
that in concolic execution, the program state has to be saved during trace generation
and later reconstructed during the offline analysis. At runtime, our trace generator will
only save the native program state (related native registers and global memory). During
the offline analysis, we can compute the IR program state directly from these native
registers and the memory.
In all of the seventeen REIL instructions, the
destination
operand can be represented
by a mathematical or logical formula of the
source
operands. Consider the second native
instruction
00401077 cmp edx, 0x62
in Table 1. Notice that the REIL instructions use
a few basic mathematical and logical operations to precisely compute all the
eflags
;
in other words, all the
eflags
can be represented as an expression in terms of
edx
and
0x62
. For example,
ZF = (edx 98) and 0xffffffff
.Insomesense,REIL
instructions are compatible with the input language of the satisfiability modulo theory
(SMT) solver Z3 [18], which supports the theories of integers, bit-vectors, and arrays.
3.2
Symbolic Execution
The symbolic execution procedure consists of three steps:
1.
Mark taint source and symbolize its value.
Here, taint sources refer to the untrusted
data in the target program. When a program variable is marked as a taint source, our
tool symbolizes the variable, by replacing its concrete value with a symbolic one (a
free variable). Traditionally, the taint sources are program inputs. However, during
interactive
security analysis, the user may be interested in tracking other program
variables as well. For example, some sensitive data items such as the password
and the registry key may become the focus of the analysis. At any time during the
program execution, CBASS can mark any byte in any register or at any memory
location as the taint source.
2.
Symbolic execution of REIL instructions.
CBASS implements the symbolic execu-
tion engine based on the REIL IR. As we have already mentioned, the semantics
of REIL instructions can be close to that of the input language of the SMT solvers.
Therefore, the symbolic encoding procedure, which takes an IR trace as input and
returns an SMT formula, is straightforward. In our implementation of the proposed
system, we have used the Z3 SMT solver, which is capable of solving formulas
expressed in the theories of bit-vectors and arrays.
3.
Check taint sink to construct constraint.
Depending on the application, security
analysts may mark different memory location or register at some interesting point
as the taint sinks. For example, to generate potential exploits, the taint sinks are
usually registers such as
EIP
. We may create a constraint to steer the execution into
a desired code section and make
EIP
equals to the address of that code section.