How Cassandra Distributes Data - Learning Apache Cassandra

Database Reference

In-Depth Information

target node for each individual row based on the mapping from its token to the new token

range assignments of the four nodes. This is a process known as rebalancing , and it's

rendered unnecessary by virtual nodes.

When a new node joins the cluster, it's simply assigned a handful of virtual nodes that pre-

viously belonged to other machines. Rather than directly recalculating the physical loca-

tion of each individual row, Cassandra can simply assign the correct number of virtual

nodes to the new machine—in this case, three—and move their contents over wholesale.

Unlike in a rebalancing scenario, where every physical machine is both losing and gaining

data, redistributing virtual nodes only requires data to be moved from the original three

machines to its new home on the fourth machine. Here's how the ring will now look:

Note

While a treatment of virtual nodes is important to cultivate a complete understanding of

how Cassandra data distribution works, it's worth emphasizing that the process of accom-

modating changes to cluster topology—such as adding, removing, or replacing nodes—is

entirely transparent to the application. Nodes can be added to, or removed from, a live

Cassandra cluster with no degradation of functionality from the application's standpoint.

The same is true for unexpected changes to the cluster, such as the failure of a node,

thanks to Cassandra replication, which we'll cover next.

Search WWH ::

Custom Search

Home