Information Technology Reference
In-Depth Information
Models and Software Model Checking of
a Distributed File Replication System
Nikolaj Bjørner
Microsoft Research, One Microsoft Way, Redmond, WA, 98074, USA
nbjorner@microsoft.com
Abstract. With the Distributed File System Replication component,
DFS-R, as the central theme, we present selected protocol problems and
validation methods encountered during design and development. DFS-R
is currently deployed in various contexts; in Windows Server 2003-R2,
Windows Live Messenger (Sharing Folders), and Windows Vista (Meet-
ing spaces). The journey from an initial design sketch to a shipped prod-
uct required mainly the dedicated effort of several testers, developers,
program managers, and several others; but in some places cute problems
related to distributed consensus and software model-checking emerged.
This paper presents a few of these, including a distributed garbage col-
lection problem, distributed consensus problems for reconciling tree-like
data structures, using model-based test case generation, and the use of
software model checking in design and development process.
1
Introduction
Designing and building distributed systems is challenging, especially if they need
to scale, perform, satisfy customer functionality requirements, and, oh well, work.
An example of a particularly challenging distributed system is multi-master, op-
timistic, file replication. One of the distinguished factors making distributed file
replication hard is that file replication comes with a very substantial data com-
ponent: the protocols need to be su ciently aware of file system semantics, such
as detecting and resolving name conflicting file creates and concurrent updates.
Such races are just the tip of the iceberg. In comparison, cache coherence proto-
cols that are known to be challenging to design, have a trivial data component,
but to be fair have stronger consistency requirements.
Subtle protocol bugs can go (and have indeed gone) undetected for years due
to the large number of interactions that are possible. With a sucient number
of deployments they will be encountered in the field, have costly consequences,
and be extremely challenging to analyze. Our experience in developing DFS-
R from the bottom up, is used to demonstrate several complementary uses of
model-based techniques for system design and exploration. This paper provides
an experience report on these selected methods. Note that the material presented
here reflect only a very partial view of the design and test of DFS-R.
DFS-R was developed to address correctness, scale, and management chal-
lenges encountered with a predecessor file replication product. Thus, the original
 
Search WWH ::




Custom Search