Service Configuration and Coordination - Real-Time Analytics

Database Reference

In-Depth Information

distributed system must decide how to handle the fact that some amount of

state is suddenly inaccessible. There are a variety of strategies in practice,

though the most common is to enter a degraded state. For example, when

connectivity loss is detected, many systems disallow changes to the

distributed state until connectivity has been restored.

Another strategy is to allow one of the partitions to remain fully functional

while degrading the capabilities of the other partition. This is usually

accomplished using a quorum algorithm, which requires that a certain

number of servers are present in the partition. For example, requiring that a

fully functional partition contain an odd number of processes is a common

strategy. If an odd number of servers are split into two groups, one group

always contains an odd number of servers whereas the other contains an

even number of servers. The group with the odd number of servers remains

functional, and the one with an even number of servers becomes read-only

or has otherwise degraded functionality.

Clock Synchronization

It may seem like a simple thing, but distributed systems often require some

sort of time synchronization. Depending on the application, this

synchronization may need to be fairly precise. Unfortunately, the hardware

clocks found in servers are not perfect and tend to drift over time. If they

drift far enough, one server can experience an event that happened after the

current time, resulting in some interesting processing events. For example,

an analysis system that was interested in the difference between the

timestamps of two types of events might start to experience negative

duration.

In most circumstances, servers are synchronized using the Network Time

Protocol (NTP). Although this still allows sometimes-significant drift

between machines, it is usually “close enough.” Problems can arise,

however, when it is not possible to synchronize machines to the same set

of NTP servers, which sometimes happens in environments that have a

secure internal domain that communicates via a very limited gateway to

an external-facing domain. In that case, an internal NTP server can drift

away from NTP servers used by external users. In some situations, such

as application programming interfaces (APIs) that use time to manage

authorization, the drift can cause profound failures of the service.

Search WWH ::

Custom Search

Home