Database Reference
In-Depth Information
must also somehow track the correctness of the current configuration or the
validity of its coordination efforts.
Managing these requirements in a distributed environment is a notoriously
difficult-to-solve problem, often leading to incorrect server behavior.
Alternatively, if the problems are not addressed, they can lead to single
points of failure in the distributed system. For “offline” processing systems,
this is only a minor concern as the single point of failure can be
re-established manually. In real-time systems, this is more of a problem as
recovery introduces potentially unacceptable delays in processing or, more
commonly, missed processing entirely.
This leads directly to the motivation behind configuration and coordination
systems: providing a system-wide service that correctly and reliably
implements distributed configuration and coordination primitives.
These primitives, similar to the coordination primitives provided for
multithreaded development, are then used to implement distributed
versions of high-level algorithms.
Maintaining Distributed State
Writingconcurrentcodethatsharesstatewithinasingleapplicationishard.
Even with operating systems and development environments providing
support for concurrency primitives, the interaction between threads and
processes is still a common source of errors. The problems are further
compounded when the concurrency spans multiple machines. Now, in
addition to the usual problems of concurrency—deadlocks, race conditions,
and so on—there are a host of new problems to address.
Unreliable Network Connections
Even in the most well-controlled datacenter, networks are unreliable
relative to a single-machine. Latency can vary widely from moment to
moment, bandwidth can change over time, and connections can be lost.
In a wide area network, a “Backhoe Event” can sever connections between
previously unified networks. For concurrent applications, this last event
(which can happen within a single datacenter) is the worst problem.
In concurrent programming, the loss of connectivity between two groups
of systems is known as the “Split Brain Problem.” When this happens, a
Search WWH ::




Custom Search