Database Reference
In-Depth Information
Like a lot of start-ups, we run on Amazon Web Services. The combination of
AWS and Cassandra has enabled us to achieve active-active inter-data-center rep-
lication with a small team that has relatively little experience in maintaining a Cas-
sandra installation. If we attempted the same thing with MySQL, we would have
to upgrade to the paid-for clustered product and set up sharding to aid horizontal
scalability. That is not impossible, but it is certainly costlier to set up and maintain
in terms of money and man-hours. In contrast, Cassandra was designed from the
outset to be horizontally scalable and deal with very large clusters of many tera-
bytes. When we took into consideration the cost (free), Cassandra became a com-
pelling proposition that we had to try.
Our typical Cassandra cluster runs across three data centers—US (east), EU
(west), Asia Pacific (Tokyo)—to provide a truly global data store. In each data
center, we run a node in each of three availability zones (AZ). We then set up our
keyspaces with a replication factor of 3; that is, one copy in each AZ. This gives
us a high level of resilience, enabling us to survive any single node or data center
failure while still supporting quorum reads and writes. If we find capacity issues,
we simply spin up some new machines, install Cassandra, and then add them to
the ring. The entire process is fast and simple. Unlike other data storage engines,
we've found that the scaling of our Cassandra clusters is a straightforward process.
We run a number of shared-use (multitenant) and single-use Cassandra clusters.
The multitenant clusters are for services with relatively light demand (in terms
of both volume and throughput)—any data that is updated relatively infrequently
such as passenger data, payment details, and so on. Using a multitenant cluster for
this sort of data means that, without negatively impacting the performance of our
platform, we have fewer clusters to monitor and maintain. This is important in a
start-up environment where you don't necessarily have the resources in terms of
machines and people to maintain lots of different Cassandra clusters. The single-
use Cassandra clusters are reserved for processes with heavy-duty usage patterns
that could impact other systems if they coexisted on the same cluster, mainly stat-
istical data with throughput of more than 1,000 writes per second. This enables
us to more effectively manage our usage and target more precisely our scaling ef-
forts. The boxes are typically m1.large instances, but our most powerful clusters
are backed by SSDs for added horsepower.
Search WWH ::




Custom Search