Database Reference
In-Depth Information
Masterless replication
If you've worked with a relational database in production, it's likely you have experience
with replication. Relational databases typically provide master-follower replication , in
which all data is written to a single master instance; then, behind the scenes, the writes are
replicated to follower instances. The application can read data from any of the followers.
Note that master-follower databases are not distributed: every machine has a full copy of
the dataset. Master-follower replication is great for scaling up the processing power avail-
able for handling read requests, but does nothing to accommodate arbitrarily large datasets.
Master-follower replication also provides some resilience against machine failure: in par-
ticular, failure of a machine will not result in data loss, since other machines have a full
copy of the same dataset.
However, a master-follower architecture cannot guarantee full availability in the case of
hardware failure. In particular, if the master instance fails, the application will be unable to
write any data until the master is restored, or one of the followers is promoted to become
the new master. The process of promoting a new master can be automated using built-in
database features or third-party tools, but there will still be some downtime during which
the application cannot write data.
Replication without a master
Cassandra solves this problem by simply removing the master instance from the picture. In
Cassandra, when a piece of data is written, the write is sent to all of the nodes that should
hold a copy of that data; no single node is authoritative. This neatly solves the availability
problem: with no master instance, there is no single point of failure. If a node becomes un-
available, the data intended for it is still written to the other nodes that should store it; the
application need not halt writing data.
Note
In fact, Cassandra is even more robust when a node is unavailable to receive a write.
Through a process called hinted handoff , other nodes in the cluster will store information
about the write request, and then replay that request to the missing node when it becomes
available again.
Returning to our model of Cassandra replication from the previous section, we can now ex-
pand it to account for replication. In particular, each virtual node is in fact stored on mul-
Search WWH ::




Custom Search