Information Technology Reference
In-Depth Information
other. Furthermore, let m , m be machines, op 1 , op 2
∈{
create , delete , update
}
,
then we define the orthogonality relation
on actions by:
op 2 ( m, π )if π
π
op 1 ( m, π )
In general, actions are considered orthogonal if they reside on different machines,
thus:
op 2 ( m )if m
= m
op 1 ( m, π )
op 1 ,op 2 ∈{
create , delete , update , rename
}
We overload the use of
to also capture idem-potency of actions. Actions that
can be considered idempotent, such as two consecutive updates to the same file,
are added to the relation.
Partial order reduction, based on
, is implemented by representing the vo-
cabulary of actions
L action =
{
a 1 ,...,a m }
using an mapping por from
{
1 ,...,m
}
into 2 { 1 ,...m} such that a i
por ( i ). The sets por ( i )areimple-
mented as bit-vectors, as m is relatively small and of fixed size. Depth first search
then prunes action sequences containing the pair a i a j if a i
a j , if and only if j
a j (that would be
j
por ( i )) and j
i .
8.6 Experiments
We ran simulation relatively early in the development process. As the product
got more stable we ran a two week experiment, distributing the search over a
cluster of 200 machines each exploring a different portion of the search space.
This helped us covering slightly more than 2 trillion scenarios, for checking main
consistency properties.
Early on, simulation caught a large number of bugs that may have been caught
later in stress. On the other hand, simulation served as a pretty good regression
test as the implementation was evolving at a rapid pace. Some of the bugs found
during simulation had the traits of being extremely dicult for a stress test to
identify. For instance, the trace below exposed a corner case in the interplay
between the components that recycled unique identifiers and those that walked
directories.
create ( m 2 , noshare/p )
rename ( m 2 , noshare/p, share/p )
read
journal ( m 2 )
sync ( m 1 ,m 2 )
rename ( m 2 , share/p, noshare/p )
rename ( m 2 , noshare/p, share/p )
read
journal ( m 2 )
update ( m 1 , share/p )
sync ( m 1 ,m 2 )
In comparison, the bugs found in stress were predominantly in the components
that are abstracted away during simulation. There were a couple of exceptions,
though. Stress exposed a divergence bug that simulation was blind to, it also
exposed a protocol error exposed by asynchronous message passing.
Search WWH ::




Custom Search