Database Reference
In-Depth Information
Handling failures
There can be multiple reasons for failures in a distributed system, such as node failures,
disk failures, network issues, power failures, or even natural or unnatural disasters. Data
loss at any given cost is simply not acceptable. DynamoDB has various techniques to
handle failures of the following types:
• Temporary failures
• Permanent failures
For temporary node failures, DynamoDB does not implement quorum-like techniques to
determine the read and write consistency, as it has to consider the network and node fail-
ures. To achieve this, DynamoDB does not enforce strict quorum techniques; instead, it
uses the sloppy quorum technique, which allows commits on a successful vote from the
first N healthiest nodes of a cluster.
If a node fails, then the replica that needs to reside on the failed node gets persisted to some
other available node. DynamoDB keeps metadata of all such data entries, and that table
gets scanned frequently. This is done to maintain the durability and availability promise.
The replica that was copied to some other node will carry a hint that gives information
about the node where it was intended to get replicated. Once the failed node is back, the
replica is restored on that node and the metadata is updated. This strategy is called hinted
handoff .
Search WWH ::




Custom Search