Case Studies - Practical Cassandra

Database Reference

In-Depth Information

Like a lot of start-ups, we run on Amazon Web Services. The combination of

AWS and Cassandra has enabled us to achieve active-active inter-data-center rep-

lication with a small team that has relatively little experience in maintaining a Cas-

sandra installation. If we attempted the same thing with MySQL, we would have

to upgrade to the paid-for clustered product and set up sharding to aid horizontal

scalability. That is not impossible, but it is certainly costlier to set up and maintain

in terms of money and man-hours. In contrast, Cassandra was designed from the

outset to be horizontally scalable and deal with very large clusters of many tera-

bytes. When we took into consideration the cost (free), Cassandra became a com-

pelling proposition that we had to try.

Our typical Cassandra cluster runs across three data centers—US (east), EU

(west), Asia Pacific (Tokyo)—to provide a truly global data store. In each data

center, we run a node in each of three availability zones (AZ). We then set up our

keyspaces with a replication factor of 3; that is, one copy in each AZ. This gives

us a high level of resilience, enabling us to survive any single node or data center

failure while still supporting quorum reads and writes. If we find capacity issues,

we simply spin up some new machines, install Cassandra, and then add them to

the ring. The entire process is fast and simple. Unlike other data storage engines,

we've found that the scaling of our Cassandra clusters is a straightforward process.

We run a number of shared-use (multitenant) and single-use Cassandra clusters.

The multitenant clusters are for services with relatively light demand (in terms

of both volume and throughput)—any data that is updated relatively infrequently

such as passenger data, payment details, and so on. Using a multitenant cluster for

this sort of data means that, without negatively impacting the performance of our

platform, we have fewer clusters to monitor and maintain. This is important in a

start-up environment where you don't necessarily have the resources in terms of

machines and people to maintain lots of different Cassandra clusters. The single-

use Cassandra clusters are reserved for processes with heavy-duty usage patterns

that could impact other systems if they coexisted on the same cluster, mainly stat-

istical data with throughput of more than 1,000 writes per second. This enables

us to more effectively manage our usage and target more precisely our scaling ef-

forts. The boxes are typically m1.large instances, but our most powerful clusters

are backed by SSDs for added horsepower.

Search WWH ::

Custom Search

Home