How DynamoDB Works - Mastering DynamoDB

Database Reference

In-Depth Information

Handling failures

There can be multiple reasons for failures in a distributed system, such as node failures,

disk failures, network issues, power failures, or even natural or unnatural disasters. Data

loss at any given cost is simply not acceptable. DynamoDB has various techniques to

handle failures of the following types:

• Temporary failures

• Permanent failures

For temporary node failures, DynamoDB does not implement quorum-like techniques to

determine the read and write consistency, as it has to consider the network and node fail-

ures. To achieve this, DynamoDB does not enforce strict quorum techniques; instead, it

uses the sloppy quorum technique, which allows commits on a successful vote from the

first N healthiest nodes of a cluster.

If a node fails, then the replica that needs to reside on the failed node gets persisted to some

other available node. DynamoDB keeps metadata of all such data entries, and that table

gets scanned frequently. This is done to maintain the durability and availability promise.

The replica that was copied to some other node will carry a hint that gives information

about the node where it was intended to get replicated. Once the failed node is back, the

replica is restored on that node and the metadata is updated. This strategy is called hinted

handoff .

Search WWH ::

Custom Search

Home