Information Technology Reference
In-Depth Information
impression was that we had the luxury of tackling a relatively well defined prob-
lem; to build a replication system specifically handling features of the file system
NTFS, for replicating files between globally dispersed branch oces of corpora-
tions. Later on, it would turn out that DFS-R could be embedded within other
scenarios, such as, in an instant messenger product. However, we consciously
avoided over-loading with features from the onset. It means that DFS-R, for in-
stance does not replicate files synchronously, only asynchronously (as it is meant
for wide area networks); does not replicate general directed acyclic graphs, only
tree-like structure; and does not maintain fine-grained tracking of operations,
only state. While several such problems are interesting in other contexts, they
did not fall into the scope of our original goals.
The organization of this paper follows the top-down design flow of DFS-R.
The DFS-R system was originally conceived as a strictly state-based file repli-
cation protocol. Section 2 elaborates on the differences between state-based and
operations-based replication systems. We developed a high-level state machine
specification of DFS-R by using a transition system presented as a collection of
guarded commands. The guarded commands were subsequently implemented as
an applicative program in OCaml. This paved the way for performing ecient
state space exploration on top of the design. Section 3 elaborates on the proto-
col, and Section 4 summarizes prototyping experiences. As the development took
place, several assumptions made in the abstract design turned out to be unreal-
istic, and we redid the high-level design using the AsmL tools that were built at
Microsoft for software modeling and test case generation. Section 5 elaborates
on the experiences from using AsmL. A number of well-separated distributed
protocol problems emerged during the development. Section 6 describes the dis-
tributed tree reconciliation problem, and how we used a model checker, Zing, to
expose bugs in both protocol proposals and legacy implementations. Section 7
describes the distributed tombstone garbage collection problem and a solution to
it. While one cannot expect to get anywhere without a high-level understanding
of the protocols involved in DFS-R, it is equally unrealistic to expect develop-
ing a production quality system without addressing systems problems. We were
thus faced with a potentially large gap between simplified protocol substrates
and the production code. Encouraged by the ability of the model-based state
space exploration to expose subtle interaction bugs we repeated the state space
exploration experiment on top of the production core. The resulting backtrack-
ing search tool may best be characterized as a hybrid software model checking,
run-time verification tool. It operates directly at the source code level. It uses
techniques, such as partial order reduction to prune search and custom allocation
routines to enable backtracking search. Section 8 describes the infrastructure we
developed and the experiments covering
1
2
trillion scenarios.
2
File Replication
The style of replication systems under which DFS-R falls into is surveyed ex-
tensively in [1]. We here summarize a few of the main concepts relevant for
 
Search WWH ::




Custom Search