Information Technology Reference
In-Depth Information
In symbolic execution one permits either uninitialized program locations or,
more generally, program locations that are initialized with symbolic expressions.
The following
program orders the values of
x
and
y
: after its execution
x
contains the maximum of
x
0
,y
0
and
y
their minimum.
intx=
x
0
; int y =
y
0
; int z = max(x,y);
if(x<z){y=x;x=z;}
PL
We use location-value pairs to represent states in symbolic execution. The ex-
pression
l
1
:=
t
1
|| · · · ||
l
n
:=
t
n
}
denotes a symbolic state in which each
program location of the form
l
i
has the expression
t
i
as its symbolic value.
After symbolic execution of the first three statements of the program above
we obtain the symbolic state
U
{
.
Symbolic execution of the conditional splits the execution into two branches,
because the value
x
0
<
max(
x
0
,y
0
) of the guard expression is symbolic and
cannot be reduced. The (negated) value of the guard becomes a
path condi-
tion
relative to which symbolic execution continues. Under the path condition
P
1
≡
=
{
x
:=
x
0
||
y
:=
y
0
||
z
:= max(
x
0
,y
0
)
}
x
0
<
max(
x
0
,y
0
) the body of the conditional is executed which results in
the final symbolic state
U
=
{
x
:= max(
x
0
,y
0
)
||
y
:=
x
0
||
z
:= max(
x
0
,y
0
)
}
.
=
y
0
which simplifies
From
P
1
and properties of max one can infer max(
x
0
,y
0
)
U
to
{
x
:=
y
0
||
y
:=
x
0
||
z
:=
y
0
}
. The other branch terminates immediately in
x
0
.
=max(
x
0
,y
0
)).
It is obvious already from this small example that simplification of inter-
mediate states wrt first-order theories is essential for eciency and to obtain
intuitive results. Modern symbolic execution engines use SMT solvers [9,13] and
also powerful built-in theorem provers [3,11] for this purpose.
The example suggests that a single state during symbolic execution of a
program
p
consists of the following three components:
state
U
under path condition
P
2
≡
x
0
≥
max(
x
0
,y
0
)(
≡
1. A program pointer to the next executable statement of the remaining state-
ments in
p
that have to be executed.
2. A path condition
P
relative to which the remaining statements are executed.
3. A symbolic state
U
relative to which the remaining statements are executed.
Symbolic execution of a program is then arranged as a
symbolic execution tree
whose nodes are triples consisting of program pointer, path condition, and sym-
bolic state.
In general it is not possible to symbolically execute a program fully, because
unbounded loops give rise to infinitely many branches with differing symbolic
path conditions. Loop invariants or induction are required to turn symbolic exe-
cution into a complete method for computing strongest post-states of programs.
2.3 Partial Evaluation
The ideas behind partial evaluation go back in time even further than those be-
hind symbolic execution: Kleene's well-known
s
mn
theorem from 1943 states that
for each computable function
f
(
x
,
y
)where
x
=
x
1
,...,x
m
,
y
=
y
1
,...,y
n
there