How Cassandra Distributes Data - Learning Apache Cassandra - page 235

Database Reference

In-Depth Information

Data distribution in Cassandra

In a traditional relational database such as MySQL or PostgreSQL, the entire contents of

the database reside on a single machine. At a certain scale, the hardware capacity of the

server running the database becomes a constraint: simply migrating to more powerful hard-

ware will lead to diminishing returns.

Let's imagine ourselves in this scenario, where we have an application running on a single-

machine database that has reached the limits of its capacity to vertically scale. In that case,

we'll want to split the data between multiple machines, a process known as sharding or

federation . Assuming we want to stick with the same underlying tool, we'll end up with

multiple database instances, each of which holds a subset of our total data. Crucially, in this

scenario, the different database instances have no knowledge of each other; as far as each

instance is concerned, it's simply a standalone database containing a standalone dataset.

It's up to our application to manage the distribution of data among our various database in-

stances. Specifically, for any given piece of data, we'll need to know which instance it be-

longs on; when we're writing data, we need to write to the right instance, and when we're

reading data, we need to read it back from that same place.

Assuming we are using integer primary keys—the standard practice in relational data-

bases—one simple strategy is to use the modulus of the primary key. If we have four data-

base instances, our primary key layout would look something like this:

Instance

Primary Keys

Instance 1 1, 5, 9, 13, …

Instance 2 2, 6, 10, 14, …

Instance 3 3, 7, 11, 15, …

Instance 4 4, 8, 12, 16, …

Of course, this strategy would likely result in great difficulty performing relational opera-

tions; it's quite likely that a foreign key stored on one instance would refer to a primary key

stored on a different instance. For this reason, in a real application we would likely design a

Next Page

Learning Apache Cassandra

Search WWH ::

Custom Search

Home